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Preface 


. . . The numerical interpretation ... i$ however necessary . ... So long 
as it is not obtained, the solutions may be said to remain incomplete and 
useless, and the truth which it is proposed to discover is no less hidden 
in the formulae of analysis than it was in the physical problem itself 

-Joseph Fourier, The Analytic Theory of Heat 


This book covers most of the standard topics in multivariate calculus, and a 
substantial part of a standard first course in linear algebra. The teacher may 
find the organization rather less standard. 

There are three guiding principles which led to our organizing the material 
as we did. One is that at this level linear algebra should be more a convenient 
setting and language for multivariate calculus than a subject in its own right. 
We begin most chapters with a treatment of a topic in linear algebra and then 
show how the methods apply to corresponding nonlinear problems. In each 
chapter, enough linear algebra is developed to provide the tools we need in 
teaching multivariate calculus (in fact, somewhat more: the spectral theorem 
for symmetric matrices is proved in Section 3.7). We discuss abstract vector 
spaces in Section 2.6, but the emphasis is on K n , as we believe that most 
students find it easiest to move from the concrete to the abstract. 

Another guiding principle is that one should emphasize computationally ef- 
fective algorithms, and prove theorems by showing that those algorithms really 
work: to marry theory and applications by using practical algorithms as the- 
oretical tools. We feel this better reflects the way this mathematics is used 
today, in both applied and in pure mathematics. Moreover, it can be done with 
no loss of rigor. 

For linear equations, row reduction (the practical algorithm) is the central 
tool from which everything else follows, and we use row reduction to prove all 
the standard results about dimension and rank. For nonlinear equations, the 
cornerstone is Newton’s method, the best and most widely used method for 
solving nonlinear equations. We use Newton’s method both as a computational 
tool and as the basis for proving the inverse and implicit function theorem, 
rather than basing those proofs on Picard iteration, which converges too slowly 
to be of practical interest. 


xi 
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Jean Dieudonn4, for many 
years a leader of Bourbaki, is the 
very personification of rigor in 
mathematics. In his book In- 
finitesimal Calculus , he put the 
harder proofs in small type, say- 
ing “ . . . a beginner will do well 
to accept plausible results without 
taxing his mind with subtle proofs 

W 

• • • • 

Following this philosophy, we 
have put many of the more diffi- 
cult proofs in the appendix, and 
feel that for a first course, these 
proofs should be omitted. Stu- 
dents should learn how to drive be- 
fore they learn how to take the car 
apart. 


In keeping with our emphasis on computations, we include a section on 
numerical methods of integration, and we encourage the use of computers to 
both to reduce tedious calculations (row reduction in particular) and as an 
aid in visualizing curves and surfaces. We have also included a section on 
probability and integrals, as this seems to us too important a use of integration 
to be ignored. 

A third principle is that differential forms are the right way to approach the 
various forms of Stokes’s theorem. We say this with some trepidation, espe- 
cially after some of our most distinguished colleagues told us they had never 
really understood what differential forms were about. We believe that differ- 
ential forms can be taught to freshmen and sophomores, if forms are presented 
geometrically, as integrands that take an oriented piece of a curve, surface, or 
manifold, and return a number. We are aware that students taking courses 
in other fields need to master the language of vector calculus, and we devote 
three sections of Chapter 6 to integrating the standard vector calculus into the 
language of forms. 

The great conceptual simplifications gained by doing electromagnetism in 
the language of forms is a central motivation for using forms, and we will apply 
the language of forms to electromagnetism in a subsequent volume. 

Although most long proofs have been put in Appendix A, we made an excep- 
tion for the material in Section 1.6. These theorems in topology are often not 
taught, but we feel we would be doing the beginning student a disservice not 
to include them, particularly the mean value theorem and the theorems con- 
cerning convergent subsequences in compact sets and the existence of minima 
and maxima of functions. In our experience, students do not find this material 
particularly hard, and systematically avoiding it leaves them with an uneasy 
feeling that the foundations of the subject are shaky. 

Different ways to use the book 

This book can be used either as a textbook in multivariate calculus or as an 
accessible textbook for a course in analysis. 

We see calculus as analogous to learning how to drive, while analysis is 
analogous to learning how and why a car works. To use this book to “learn 
how to drive,” the proofs in Appendix A should be omitted. To use it to “learn 
how a car works,” the emphasis should be on those proofs. For most students, 
this will be best attempted when they already have some familiarity with the 
material in the main text. 

Students who have studied first year calculus only 

(1) For a one-semester course taken by students have studied neither linear 
algebra nor multivariate calculus, we suggest covering only the first four chap- 
ters, omitting the sections marked “optional,” which, in the analogy of learning 
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to drive rather than learning how a car is built, correspond rather to learning 
how to drive on ice. (These sections include the part of Section 2.8 concerning 
a stronger version of the Kantorovitch theorem, and Section 4.4 on measure 
0). Other topics that can be omitted in a first course include the proof of the 
fundamental theorem of algebra in Section 1.6, the discussion of criteria for 
differentiability in Section 1.9, Section 3.2 on manifolds, and Section 3.8 on 
the geometry of curves and surfaces. (In our experience, beginning students 
do have trouble with the proof of the fundamental theorem of algebra, while 
manifolds do not pose much of a problem.) 

(2) The entire book could also be used for a full year’s course. This could be 
done at different levels of difficulty, depending on the students’ sophistication 
and the pace of the class. Some students may need to review the material 
in Sections 0.3 and 0.5; others may be able to include some of the proofs in 
the appendix, such as those of the central limit theorem and the Kantorovitch 
theorem. 

(3) With a year at one’s disposal (and excluding the proofs in the appendix), 
one could also cover more than the present material, and a second volume is 
planned, covering 

applications of differential forms; 

abstract vector spaces, inner product spaces, and Fourier series; 

electromagnetism; 

differential equations; 

eigenvalues, eigenvectors, and differential equations. 

We favor this third approach; in particular, we feel that the last two topics 
above are of central importance. Indeed, we feel that three semesters would 
not be too much to devote to linear algebra, multivariate calculus, differential 
forms, differential equations, and an introduction to Fourier series and partial 
differential equations. This is more or less what the engineering and physics 
departments expect students to learn in second year calculus, although we feel 
this is unrealistic. 

Students who have studied some linear algebra or multivariate 
calculus 


The book can also be used for students who have some exposure to either 
linear algebra or multivariate calculus, but who are not ready for a course in 
analysis. We used an earlier version of this text with students who had taken 
a course in linear algebra, and feel they gained a great deal from seeing how 
linear algebra and multivariate calculus mesh. Such students could be expected 
to cover Chapters 1-6, possibly omitting some of the optional material discussed 
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above. For a less fast-paced course, the book could also be covered in an entire 
year, possibly including some proofs from the appendix. 

Students ready for a course in analysis 


We view Chapter 0 primarily 
as a resource for students, rather 
than as part of the material to be 
covered in class. An exception is 
Section 0.4, which might well be 
covered in a class on analysis. 


If the book is used as a text for an analysis course, then in one semester one 
could hope to cover all six chapters and some or most of the proofs in Appendix 
A. This could be done at varying levels of difficulty; students might be expected 
to follow the proofs, for example, or they might be expected to understand them 
well enough to construct similar proofs. Several exercises in Appendix A and 
in Section 0.4 are of this nature. 


Mathematical notation is not 
always uniform. For example, \A\ 
can mean the length of a matrix 
A (the meaning in this book) or 
it can mean the determinant of 
A. Different notations for partial 
derivatives also exist. This should 
not pose a problem for readers 
who begin at the beginning and 
end at the end, but for those who 
are using only selected chapters, 
it could be confusing. Notations 
used in the book are listed on the 
front inside cover, along with an 
indication of where they are first 
introduced. 


Numbering of theorems, examples, and equations 

Theorems, lemmas, propositions, corollaries, and examples share the same num- 
bering system. For example, Proposition 2.3.8 is not the eighth proposition of 
Section 2.3; it is the eighth numbered item of that section, and the first num- 
bered item following Example 2.3.7. We often refer back to theorems, examples, 
and so on, and hope this numbering will make them easier to find. 

Figures are numbered independently; Figure 3.2.3 is the third figure of Sec- 
tion 3.2. All displayed equations are numbered, with the numbers given at 
right; Equation 4.2.3 is the third equation of Section 4.2. When an equation 
is displayed a second time, it keeps its original number, but the number is in 
parentheses. 

We use the symbol A to mark the end of an example or remark, and the 
symbol □ to mark the end of a proof. 

Exercises 

Exercises are given at the end of each chapter, grouped by section. They range 
from very easy exercises intended to make the student familiar with vocabulary, 
to quite difficult exercises. The hardest exercises are marked with a star (or, in 
rare cases, two stars). On occasion, figures and equations are numbered in the 
exercises. In this case, they are given the number of the exercise to which they 
pertain. 

In addition, there are occasional “mini-exercises 1 ’ incorporated in the text, 
with answers given in footnotes. These are straightforward questions contain- 
ing no tricks or subtleties, and are intended to let the student test his or her 
understanding (or be reassured that he or she has understood). We hope that 
even the student who finds them too easy will answer them; working with pen 
and paper helps vocabulary and techniques sink in. 
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Web page 

Errata will be posted on the web page 

http://math.cornell.edu/" hubbard/vectorcalculus. 

The three programs given in Appendix B will also be available there. We plan 
to expand the web page, making the programs available on more platforms, and 
adding new programs and examples of their uses. 

Readers are encouraged to write the authors at jhh8@cornell.edu to signal 
errors, or to suggest new exercises, which will then be shared with other readers 
via the web page. 
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II 


inaries 


0.0 Introduction 

This chapter is intended as a resource, providing some background for those 
who may need it. In Section 0.1 we share some guidelines that in our expe- 
rience make reading mathematics easier, and discuss a few specific issues like 
sum notation. Section 0.2 analyzes the rather tricky business of negating math- 
ematical statements. (To a mathematician, the statement “All seven-legged 
alligators are orange with blue spots” is an obviously true statement, not an 
obviously meaningless one.) Section 0.3 reviews set theory notation; Section 
0.4 discusses the real numbers; Section 0.5 discusses countable and uncountable 
sets and Russell’s paradox; and Section 0.6 discusses complex numbers. 

0.1 Reading Mathematics 

The most efficient logical order for a subject is usually different from the 
best psychological order in which to learn it. Much mathematical writing 
is based too closely on the logical order of deduction in a subject , with too 
many definitions without , or before, the examples which motivate them, 
and too many answers before, or without, the questions they address . — 
William Thurston 

Reading mathematics is different from other reading. We think the following 
guidelines can make it easier. First, keep in mind that there are two parts to 
understanding a theorem: understanding the statement, and understanding the 
proof. The first is more important than the second. 

What if you don’t understand the statement? If there’s a symbol in the 
formula you don’t understand, perhaps a 8 , look to see whether the next line 
continues, “where 8 is such-and-such.” In other words, read the whole sentence 
before you decide you can’t understand it. In this book we have tried to define 
all terms before giving formulas, but we may not have succeeded everywhere. 

If you’re still having trouble, skip ahead to examples. This may contradict 
what you have been told — that mathematics is sequential, and that you must 
understand each sentence before going on to the next. In reality, although 
mathematical writing is necessarily sequential, mathematical understanding is 
not: you (and the experts) never understand perfectly up to some point and 


We recommend not spending 
much time on Chapter 0. In par- 
ticular, if you are studying multi- 
variate calculus for the first time 
you should definitely skip certain 
parts of Section 0.4 (Definition 
0.4.4 and Proposition 0.4.6). How- 
ever, Section 0.4 contains a discus- 
sion of sequences and series which 
you may wish to consult when we 
come to Section 1.5 about conver- 
gence and limits, if you find you 
don’t remember the convergence 
criteria for sequences and series 
from first year calculus. 


1 



2 Chapter 0. Preliminaries 


The Greek Alphabet 

Greek letters that look like Ro- 
man letters are not used as mathe- 
matical symbols; for example, A is 
capital a, not capital a. The letter 
X is pronounced “kye,” to rhyme 
with “sky”; <p, ip and £ may rhyme 


either 

“sky” 

or “tea." 

a 

A 

alpha 

0 

B 

beta 

7 

r 

gamma 

6 

A 

delta 

c 

E 

epsilon 

< 

Z 

zeta 

V 

H 

eta 

e 

0 

theta 

i 

I 

iota 

K 

K 

kappa 

A 

A 

lambda 

p 

M 

mu 

V 

N 

nu 


tr 

xi 

o 

o 

omicron 

7T 

n 

Pi 

P 

p 

rho 

a 

£ 

sigma 

T 

T 

tau 

V 

T 

upsilon 

V’.V’ 

$ 

phi 

X 

X 

chi 


* 

psi 

u) 

a 

omega 


not at all beyond. The “beyond,” where understanding is only partial, is an 
essential part of the motivation and the conceptual background of the “here and 
now.” You may often (perhaps usually) find that when you return to something 
you left half-understood, it will have become clear in the light of the further 
things you have studied, even though the further things are themselves obscure. 

Many students are very uncomfortable in this state of partial understanding, 
like a beginning rock climber who wants to be in stable equilibrium at all times. 
To learn effectively one must be willing to leave the cocoon of equilibrium. So 
if you don’t understand something perfectly, go on ahead and then circle back. 

In particular, an example will often be easier to follow than a general state- 
ment; you can then go back and reconstitute the meaning of the statement in 
light of the example. Even if you still have trouble with the general statement, 
you will be ahead of the game if you understand the examples. We feel so 
strongly about this that we have sometimes flouted mathematical tradition and 
given examples before the proper definition. 

Read with pencil and paper in hand , making up little examples for yourself 
as you go on. 

Some of the difficulty in reading mathematics is notational. A pianist who 
has to stop and think whether a given note on the staff is A or F will not be 
able to sight-read a Bach prelude or Schubert sonata. The temptation, when 
faced with a long, involved equation, may be to give up. You need to take the 
time to identify the “notes.” 

Learn the names of Greek letters — not just the obvious ones like alpha, beta, 
and pi, but the more obscure psi, xi, tau, omega. The authors know a math- 
ematician who calls all Greek letters “xi,” (f) except for omega (a;), which he 
calls “w.” This leads to confusion. Learn not just to recognize these letters, but 
how to pronounce them. Even if you are not reading mathematics out loud, it 
is hard to think about formulas if £, ip, r, (p are all “squiggles” to you. 

Sum and product notation 

Sum notation can be confusing at first; we are accustomed to reading in one 
dimension, from left to right, but something like 

n 

^ ^ a i,kbk,j 0.1.1 

k=l 


requires what we might call two-dimensional (or even three-dimensional) think- 
ing. It may help at first to translate a sum into a linear expression: 


In Equation 0.1.3, the symbol 

oo 

says that the sum will have 

E2‘ = 2° + 2 , +2 2 ... 

n terms. Since the expression be- 

dm, *4 

*= 0 

ing summed is a t ,kbkj, each of or 
those n terms will have the form 

n 

ab. 

a i,kbk,j = 0>i,lbi'j + O t) 262 ,j + • • 

k= I 


0.1.3 
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Two Placed side by side do not denote the product of two sums; one sum 
is used to talk about one index, the other about another. The same thing could 
be written with one with information about both indices underneath. For 
example, 




rp- © 

— {ZED* 


2 

1 


g]. ..£T}....{±). 



1 2 



]££(»+.?) = (*+■>) 

»= 1 j = 2 i from 1 to 3, 

j from 2 to 4 

= ((1 + 2) + (1 + 3) + (1 + 4)) 

+ ((2 + 2) + (2 + 3) + (2 + 4)) 

+ ((3 + 2) + (3 + 3) + (3 + 4)); 

this double sum is illustrated in Figure 0.1.1. 

Rules for product notation are analogous to those for sum notation: 


Figure 0.1.1, 


In the double sum of Equation 
0.1.4, each sum has three terms, so 
the double sum has nine terms. 


Proofs 


a.i = a\ • a 2 • ■ a n ; for example, i - n\. 
»=i *=i 


0.1.4 


When Jacobi complained that 
Gauss’s proofs appeared unmoti- 
vated, Gauss is said to have an- 
swered, You build the building and 
remove the scaffolding. Our sym- 
pathy is with Jacobi’s reply: he 
likened Gauss to the fox who 
erases his tracks in the sand with 
his tail. 


We said earlier that it is more important to understand a mathematical state- 
ment than to understand its proof. We have put some of the harder proofs in 
the appendix; these can safely be skipped by a student studying multivariate 
calculus for the first time. We urge you, however, to read the proofs in the main 
text. By reading many proofs you will learn what a proof is, so that (for one 
thing) you will know when you have proved something and when you have not. 

In addition, a good proof doesn’t just convince you that something is true; 
it tells you why it is true. You presumably don’t lie awake at night worrying 
about the truth of the statements in this or any other math textbook. (This 
is known as “proof by eminent authority” ; you assume the authors know what 
they are talking about.) But reading the proofs will help you understand the 
material. 

If you get discouraged, keep in mind that the content of this book represents 
a cleaned-up version of many false starts. For example, John Hubbard started 
by trying to prove Fubini’s theorem in the form presented in Equation 4.5.1. 
When he failed, he realized (something he had known and forgotten) that the 
statement was in fact false. He then went through a stack of scrap paper before 
coming up with a correct proof. Other statements in the book represent the 
efforts of some of the world ’s best mathematicians over many years. 
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0.2 How TO NEGATE MATHEMATICAL STATEMENTS 

Even professional mathematicians have to be careful not to get confused 
when negating a complicated mathematical statement. The rules to follow are: 

(1) The opposite of 

[For all x, P(x) is true) ^ 

is [There exists x for which P(x) is not true]. 

Above, P stands for “property.” Symbolically the same sentence is written: 

The opposite of Vx, P(x) is 3x| not P(x). 0.2.2 

Instead of using the bar | to mean “such that” we could write the last line 
(3x)(not P(x)). Sometimes (not in this book) the symbols ~ and are used 
to mean “not.” 

(2) The opposite of 

[There exists x for which R(x) is true] 

0.2.3 

is [For all x, R(z) is not true]. 

Symbolically the same sentence is written: 

The opposite of (3x)(P(x)) is (Vx) not P(x). 0.2.4 

These rules may seem reasonable and simple. Clearly the opposite of the 
(false) statement, “All rational numbers equal 1,” is the statement, “There 
exists a rational number that does not equal 1.” 

However, by the same rules, the statement, “All seven-legged alligators are 
orange with blue spots” is true, since if it were false, then there would exist a 
seven-legged alligator that is not orange with blue spots. The statement, “All 
seven-legged alligators are black with white stripes” is equally true. 

In addition, mathematical statements are rarely as simple as “All rational 
numbers equal 1.” Often there are many quantifiers and even the experts have 
to watch out. At a lecture attended by one of the authors, it was not clear to 
the audience in what order the lecturer was taking the quantifiers; when he was 
forced to write down a precise statement, he discovered that he didn’t know 
what he meant and the lecture fell apart. 

Here is an example where the order of quantifiers really counts: in the defi- 
nitions of continuity and uniform continuity. A function / is continuous if for 
all x, and for all c > 0, there exists S > 0 such that for all y, if |x - y| < <5, then 
|/(x) - }(y)\ < c. That is, / is continuous if 

(Vx)(Ve > 0)(3<5 > 0)(Vy) (|x _ y\ < 6 =* |/(x) - f(y)\ < e). 


Statements that to the ordi- 
nary mortal are false or meaning- 
less cure thus accepted as true by 
mathematicians; if you object, the 
mathematician will retort, “find 
me a counter-example.” 


0.2.5 
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A function / is uniformly continuous if for all e > 0, there exists 6 > 0 for 
ail x and all y such that if jx — y\ < 5, then |/(x) — f(y) | < c. That is, / is 
uniformly continuous if 

(Vf > 0)(3<5 > 0 )(Vi)(Vy) (|x - y\ < S =s- |/(x) - f(y) | < e). 0.2.6 

For the continuous function, we can choose different 6 for different x ; for the 
uniformly continuous function, we start with e and have to find a single 6 that 
works for all x. 

For example, the function f{x) = x 2 is continuous but not uniformly con- 
tinuous: as you choose bigger and bigger x, you will need a smaller 6 if you 
want the statement |x — y\ < S to imply |/(x) — f(y)\ < e, because the function 
keeps climbing more and more steeply. But sinx is uniformly continuous; you 
can find one S that works for all x and all y. 

0.3 Set Theory 

At the level at which we are working, set theory is a language, with a vocab- 
ulary consisting of seven words. In the late 1960’s and early 1970’s, under the 
sway of the “New Math,” they were a standard part of the elementary school 
curriculum, and set theory was taught as a subject in its own right. This was a 
resounding failure, possibly because many teachers, understandably not know- 
ing why set theory was being taught at all, made mountains out of molehills. As 
a result the schools (elementary, middle, high) have often gone to the opposite 
extreme, and some have dropped the subject altogether. 

The seven vocabulary words are 

€ “is an element oP 

{o |p(o)} “the set of a such that p(d) is true” 

C “is a subset of’ (or equals, when Ac A) 

fl “intersect”: A fl B is the set of elements of both A and B. 

U “union” : A U B is the set of elements of either A or B 

or both. 

x “cross”: A x B is the set of pairs (a, b) with a e A and 

be B. 

- “complement”: A - B is the set of elements in A that 

are not in B. 

One set has a standard name: the empty set <£>, which has no elements. 
There are also sets of numbers that have standard names; they are written in 
black-board bold, a font we use only for these sets. Throughout this book and 
most other mathematics books (with the exception of N, as noted in the margin 
below), they have exactly the same meaning: 


There is nothing new about 
the concept of “set” denoted by 
{a|p(a)}. Euclid spoke of geo- 
metric loci , a locus being the set 
of points defined by some prop- 
erty. (The Latin word locus means 
“place” ; its plural is loci.) 
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N is for “natural,” Z is for 
“Zahl the German for number, 
Q is for “quotient,” IR is for “real,” 
and C is for “complex.” Mathe- 
matical notation is not quite stan- 
dard: some authors do not include 
0 in N. 

When writing with chalk on a 
black board, it’s hard to distin- 
guish between normal letters and 
bold letters. Black-board bold 
font is characterized by double 
lines, as in N and IR. 


N “the natural numbers” {0, 1, 2, ... } 

Z “the integers,” i.e., signed whole numbers {...,— 1,0, 1, . . . } 

<Q “the rational numbers” p/q, with 

IR “the real numbers,” which we will think of as infinite decimals 
C “the complex numbers” {a + 2 &| a, b G R} 


Often we use slight variants of the notation above: {3, 5, 7} is the set consist- 
ing of 3, 5, and 7, and more generally, the set consisting of some list of elements 
is denoted by that list, enclosed in curly brackets, as in 

{n| n € N and n is even} = {0,2,4,...}, 0.3.1 

where again the vertical line | means “such that.” 

The symbols are sometimes used backwards; for example, A D B means 
B C A, as yon probably guessed. Expressions are sometimes condensed: 


{x € IR | x is a square } means {x \ x € IR and x is a square} , 0.3.2 


i.e., the set of non-negative real numbers. 

A slightly more elaborate variation is indexed unions and intersections: if 
S Q is a collection of sets indexed by a 6 A, then 


Although it may seem a bit 
pedantic, you should notice that 

(J l n and {/ n | n € Z} 
n€ z 

are not the same thing: the first 
is a subset of the plane; an ele- 
ment of it is a point on one of 
the lines. The second is a set of 
lines, not a set of points. This 
is similar to one of the molehills 
which became mountains in the 
new-math days: telling the differ- 
ence between and {<#}, the set 
whose only element is the empty 
set. 


p| S q denotes the intersection of all the S a , and 

a€A 

[J S Q denotes their union. 
a£A 

For instance, if l n C IR 2 is the line of equation y = n, then Un€Z is the set 
of points in the plane whose ^-coordinate is an integer. 

We will use exponents to denote multiple products of sets; A x A x • • • x A 
with n terms is denoted A n : the set of n-tuples of elements of A. 

If this is all there is to set theory, what is the fuss about? For one thing, 
historically, mathematicians apparently did not think in terms of sets, and 
the introduction of set theory was part of a revolution at the end of the 19th 
century that included topology and measure theory. We explore another reason 
in Section 0.5, concerning infinite sets and Russell’s paradox. 


0.4 Real Numbers 


Showing that all such construc- 
tions lead to the same numbers is 
a fastidious exercise, which we will 
not pursue. 


All of calculus, and to a lesser extent linear algebra, is about real numbers. 
In this introduction, we will present real numbers, and establish some of then- 
most useful properties. Our approach privileges the writing of numbers in base 
10; as such it is a bit unnatural, but we hope you will like our real numbers 
being exactly the numbers you are used to. Also, addition and multiplication 
will be defined in terms of finite decimals. 
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There are more elegant 
approaches to defining real num- 
bers, (Dedekind cuts, for instance 
(see, for example, Michael Spivak, 
Calculus, second edition, 1980, pp. 
554-572), or Cauchy sequences of 
rational numbers; one could also 
mirror the present approach, writ- 
ing numbers in any base, for in- 
stance 2. Since this section is par- 
tially motivated by the treatment 
of floating-point numbers on com- 
puters, base 2 would seem very 
natural. 


The least upper bound prop- 
erty of the reals is often taken as 
an axiom; indeed, it characterizes 
the real numbers, and it sits at 
the foundation of every theorem in 
calculus. However, at least with 
the description above of the reals, 
it is a theorem, not an axiom. 

The least upper bound sup A 
is sometimes denoted l.u.b.A; the 
notation max A is also sometimes 
used, but suggests to some people 
that max X € X. 


Numbers and their ordering 

By definition, the set of real numbers is the set of infinite decimals: expressions 
like 2.95765392045756 . . . , preceded by a plus or a minus sign (in practice the 
plus sign is usually omitted). The number that you usually think of as 3 is the 
infinite decimal 3.0000 . . . , ending in all zeroes. 

The following identification is absolutely vital: a number ending in all 9’s is 
equal to the “rounded up” number ending in all 0’s: 

0.34999999 • • • = 0.350000 .... 0.4.1 

Also, +.0000 • • • = -.0000 Other than these exceptions, there is only one 

way of writing a real number. 

Numbers that start with a + sign, except +0.000 . . . , are positive; those 
that start with a - sign, except —0.00 . . . , are negative. If x is a real number, 
then -x has the same string of digits, but with the opposite sign in front. For 
A: > 0, we will denote by [x]* the truncated finite decimal consisting of all the 
digits of x before the decimal, and exactly k digits after the decimal. To avoid 
ambiguity, if x is a real number with two decimal expressions, [x]* will be the 
finite decimal built from the infinite decimal ending in 0’s; for the number in 
Equation 0.4.1, [x ]3 = 0.350. 

Given any two different numbers x and y , one is always bigger than the other. 
This is defined as follows: if x is positive and y is non-positive, then x > y. If 
both are positive, then in their decimal expansions there is a first digit in which 
they differ; whichever has the larger digit in that position is larger. If both are 
negative, then x > y if -y > -x. 

The least upper bound property 

Definition 0.4.1 (Upper bound; least upper bound). A number a is 
an upper bound for a subset X C K if for every x 6 X we have x < a. A 
least upper bound is an upper bound b such that for any other upper bound 
a, we have b < a. The least upper bound is denoted sup. 

Theorem 0.4.2. Every non-empty subset X c Ufc that has an upper bound 
has a least upper bound sup A'. 

Proof. We will construct successive decimals of sup A. Let us suppose that 
x € A is an element (which we know exists, since A ^ <$) and that a is an 
upper bound. We will assume that x > 0 (the case x < 0 is slightly different). 
If x = a, we are done: the least upper bound is a. 



Recall that [a]> denotes the fi- 
nite decimal consisting of all the 
digits of a before the decimal, and 
j digits after the decimal. 

We use the symbol □ to mark 
the end of a proof, and the symbol 
A to denote t he end of an example 
or a remark. 


Because you learned to add, 
subtract, divide, and multiply in 
elementary school, the algorithms 
used may seem obvious. But un- 
derstanding how computers sim- 
ulate real numbers is not nearly 
as routine as you might imagine. 
A real number involves an infinite 
amount of information, and com- 
puters cannot handle such things: 
they compute with finite decimals. 
This inevitably involves rounding 
off, and writing arithmetic subrou- 
tines that minimize round-off er- 
rors is a whole art in itself. In 
particular, computer addition and 
multiplication are not commuta- 
tive or associative. Anyone who 
really wants to understand numer- 
ical problems has to take a serious 
interest in “computer arithmetic.” 


8 Chapter 0. Preliminaries 

If x ^ a . there is a first j such that the jth digit of x is smaller than the jth 
digit of a. Consider all the numbers in [x,a] that can be written using only j 
digits after the decimal, then all zeroes. This is a finite non-empty set, in fact 
it has at most 10 elements, and (a], is one of them. Let bj be the largest which 
is not an upper bound. Now consider the set of numbers in [ 6 jf,a] that have 
only j 4 - 1 digits after the decimal point, then all zeroes. Again this is a finite 
non-empty set, so you can choose the largest which is not an upper bound; call 
it 6j +1 . It should be clear that b j+ i is obtained by adding one digit to bj. Keep 
going this way, defining numbers 6 j + 2 >fy+ 3 , . . . , each time adding one digit to 
the previous number. We can let 6 be the number whose A;th decimal digit is 
the same as that of 6 a; we claim that 6 = sup AT. 

Indeed, if there exists y 6 X with y > 6 , then there is a first digit k of y 
which differs from the Artli digit of 6, and then 6a was not the largest number 
with k digits which is not gin upper bound, since using the Arth digit of y would 
give a bigger one. So b is an upper bound. 

Now suppose that b' < b is also an upper bound. Again there is a first digit 
A; of 6 which is different from that of 6 '. This contradicts the fact that 6a was 
not an upper bound, since then 6 a > 6 '. □ 

Arithmetic of real numbers 

The next task is to make arithmetic work for the reals: defining addition, mul- 
tiplication, subtraction, and division, and to show that the usual rules of arith- 
metic hold. This is harder than one might think: addition and multiplication 
always start at the right, and for reals there is no right. 

The underlying idea is to show that if you take two reals, truncate (cut) them 
further and further to the right and add them (or multiply them, or subtract 
them, etc.) and look only at the digits to the left of any fixed position, the 
digits we see will not be affected by where the truncation takes place, once it is 
well beyond where we are looking. The problem with this is that it isn’t quite 
true. 

Example 0.4.3 (Addition). Consider adding the following two numbers: 

. 222222 ... 222 ... 

.777777... 778... 

The sum of the truncated numbers will be .9999 ... 9 if we truncate before the 
position of the 8, and 1.0000. . .0 if we truncate after the 8. So there cannot 
be any rule which says: the 100 th digit will stay the same if you truncate after 
the A/th digit, however large N is. The carry can come from arbitrarily far to 
the right. 

If you insist on defining everything in terms of digits, it can be done but 
is quite involved: even showing that addition is associative involves at least 
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six different cases, and although none is hard, keeping straight what yon are 
doing is quite delicate. Exercise 0.4.1 should give you enough of a taste of 
this approach. Proposition 0.4.6 allows a general treatment; the development 
is quite abstract, and you should definitely not think you need to understand 
this in order to proceed. 

Let us denote by D the set of finite decimals. 


ID 1 stands for “finite decimal.” 


Definition 0.4.4 (Finite decimal continuity). A mapping / : !D n — ► D 
will be called finite decimal continuous (ID-continuous) if for all integers N 
and k, there exists l such that if (xi , . . . , x n ) and (yi , . . . , y„) are two elements 
of ff> n with all |x t |, |y t j < N , and if |x» - y»| < 10"‘ i for all t = 1, . . . ,n, then 

|/(xi,...,x n ) - /(yi,...,y n )| < 10“*. 0.4.2 


We use A for addition, M for 
multiplication, and S for subtrac- 
tion; the function Assoc is needed 
to prove associativity of addition. 

Since we don’t yet have a no- 
tion of subtraction in IP., we can't 
write l-r-y) < c, much less £(x, - 
y,) 2 < c 2 . which involves addition 
and multiplication besides. Our 
definit ion of A--close uses only sub- 
traction of finite decimals. 

The notion of fc-close is the cor- 
rect way of saying that two num- 
bers agree to k digits after the dec- 
imal point. It takes into account 
the convention by which a num- 
ber ending in all 9’s is equal to the 
rounded up number ending in all 
0’s: the numbers .9998 and 1.0001 
are 3-close. 


The functions A and M sat- 
isfy the conditions of Proposition 
0.4.6; thus they apply to the real 
numbers, while A and A/ without 
tildes apply to finite decimals. 


Exercise 0.4.3 asks you to show that the functions A(x, y) = x + y, M(x, y) = 
xy, S(x, y) = x - y, Assoc(x, y) = (x + y) + 2 are ID-continuous, and that 1/x 
is not. 

To see why Definition 0.4.4 is the right definition, we need to define what it 
means for two points x,y € R n to be close. 

Definition 0.4.5 (fc-close). Two points x,y € R” are fc-close if for each 
i = 0, . . . ,n, then |[xi]* - [y,]*| < 10"*. 


Notice that if two numbers are k-dose for all k , then they are equal (see 
Exercise 0.4.2). 

If/ : ID” — * D is ID-continuous, then define / : R” — > R by the formula 

/(x) = sup inf /([xi]i, . . . , [x n ]/). 0.4.3 

k *> k 


Proposition 0.4.6. The function f : R” — > R is the unique function that 
coincides with f on ID” and which satisfies that the continuity condition for 
all k € N, for all N € N, there exists l € N such that when x, y € R” are 
/-close and all coordinates x* of x satisfy |XjJ < N, then f(x) and /( y) are 
k-close. 

The proof of Proposition 0.4.6 is the object of Exercise 0.4.4. 

With this proposition, setting up arithmetic for the reals is plain sailing. 

Consider the ID-continuous functions A(x, y) = x + y and A/(x, y) = xy; then 
we define addition of reals by setting 

x-fy = A(x,y) and xy = Af(x,y). 

It isn’t harder to show that the basic laws of arithmetic hold: 


0.4.4 
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x + y = y + x 

{x + y) + z = x + {y + z) 

x + (-x) = 0 

xy — yx 

(xy)z = x(yz) 

x(y + z) = xy + xz 


Addition is commutative. 

Addition is associative. 

Existence of additive inverse. 
Multiplication is commutative. 
Multiplication is associative. 

Multiplication is distributive over addition. 


These are all proved the same way: let us prove the last. Consider the 
function D 3 -> B given by 

F(x,y, z) ~ M(x,A(y,z)) - A(M(x,y), M{x, z)). 0.4.5 


We leave it to you to check that F is ID-continuous, and that 

F(x, y, z) = Af(x, A(y, z)) - A(M (x, y), M (x, z )) . 0.4.6 


It is one of the basic irritants 
of elementary school math that 
division is not defined in the world 
of finite decimals. 


But F is identically 0 on B 3 , and the identically 0 function on R 3 is a function 
which coincides with 0 on D 3 and satisfies the continuity condition of Proposi- 
tion 0.4.6, so F vanishes identically by the uniqueness part of Proposition 0.4.6. 
That is what was to be proved. 

This sets up almost all of arithmetic; the missing piece is division. Exercise 
0.4.5 asks you to define division in the reals. 


Sequences and series 


A sequence is an infinite list (of numbers or vectors or matrices . . . ). 


All of calculus is based on this 
definition, and the closely related 
definition of limits of functions. 

If a series converges, then the 
same list of numbers viewed as a 
sequence must converge to 0. The 
converse is not true. For example, 
the harmonic series 


docs not converge, although the 
terms tend to 0. 


Definition 0.4. T (Convergent sequence). A sequence a n of real numbers 
is said to converge to the limit a if for all c > 0, there exists N such that for 
all n > AT, we have |a - a n | < e. 

Many important sequences appear as partial sums of series. A series is a 
sequence where the terms are to be added. If ai,a2, . • . is a series of numbers, 
then the associated sequence of partial sums is the sequence s\ , s 2 , • • ■ > where 

N 

s;v = ^a n . 0.4.7 

n*l 

For example, if ai = 1, <*2 = 2, <13 = 3, and so on, then 54 = I+2 + 3-f4. 
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Example of geometric series: 
2.020202 = 

2 4 2(.01) 4 2(.01) 2 4- . . . 
2 

" 1 (.01) 



It is hard to overstate the im- 
portance of this problem: prov- 
ing that a limit exists without 
knowing ahead of time what it 
is. It was a watershed in the his- 
tory of mathematics, and remains 
a critical dividing point between 
first year calculus and multivari- 
ate calculus, and more generally, 
between elementary mathematics 
and advanced mathematics. 


Definition 0.4.8 (Convergent series). If the sequence of partial sums of 
a series has a limit S , we say that the series converges, and its limit is 

^ a n = 5. 0.4.8 

n= 1 


Example 0.4.9 (Geometric series). If |r| < 1, then 

oo 


n=0 


a 

1 — r 


0.4.9 


Indeed, the following subtraction shows that S n ( 1 — r) — a — ar n+l : 

S n = a 4 ar 4 ar 2 4- ar 3 4 • • • 4* ar n 
S n r = ar -i- ar 2 4- ar 3 4 h ar n 4 ar n+1 


S n (l - r) = a -ar n+1 

But lim n _ 3 o ar n+1 = 0 when |r| < 1, so we can forget about the -ar n+1 : as 
n — ► oo, we have 5 n — ► a/( 1 - r). A 


Proving convergence 

The weakness of the definition of a convergent sequence is that it involves the 
limit value. At first, it is hard to see how you will ever be able to prove that a 
sequence has a limit if you don’t know the limit ahead of time. 

The first result along these lines is the following theorem. 

Theorem 0.4.10. A non-decreasing sequence a n converges if and only if 
it is bounded. 

Proof. Since the sequence a n is bounded, it has a least upper bound A. We 
claim that A is the limit. This means that for any e > 0, there exists N such 
that if n > N, then Ja n - A\ < e. Choose e > 0; if A - a n > e for all n, then 
A - e is an upper bound for the sequence, contradicting the definition of A. So 
there is a first N with A - < e, and it will do, since when n > A r , we must 

have A - a n < A - as < e. □ 

Theorem 0.4.10 has the following consequence: 

Theorem 0.4.11. If On is a series such that the series of absolute values 

oo oo 

y; |a„| converges, then so does the series y a n . 

n=l n=l 



12 Chapter 0. Preliminaries 


Proof. The series a n + la*! is a series of non-negative numbers, and 
so the partial sums b m = ]C!T=i( a * + l a n|) are non-decreasing. They are also 
bounded: 

m mm oo 

b m = + |a„|) < £ 2|a„| = 2 £ |a„| < 2 £ |a n |. 0.4.11 

n=l n=l n*l n=I 


So (by Theorem 0.4.10) the b m form a convergent sequence, and finally 


oo 

n=l 


£(- + l^nl) + (“ W) 

n — 1 •>» — 1 


0.4.12 


One unsuccessful 19th century 
definition of continuity stated that 
a function / is continuous if it sat- 
isfies the intermediate value the- 
orem: if, for all a < b t f takes 
on all values between f(a) and 
f(b) at some c € (a, 6]. You are 
asked in Exercise 0 4.7 to show 
that this does not coincide with 
the usual definition (and presum- 
ably not with anyone’s intuition of 
what continuity should mean). 


represents the series YlnLi 08 sum of two numbers, each one the sum of 
a convergent series. □ 

The intermediate value theorem 

The intermediate value theorem is a result which appears to be obviously true, 
and which is often useful. Moreover, it follows easily from Theorem 0.4.2 and 
the definition of continuity. 

Theorem 0.4.12 (Intermediate value theorem). Iff : [a, 6] — ► M is 
a continuous function such that /(a) < 0 and f(b) > 0, then there exists 
c € [a, 6] such that f(c ) — 0. 


Proof. Let X be the set of <r € [a, 6] such that f{x) < 0. Note that X is 
non-empty (a is in it) and it has an upper bound, namely b, so that it has a 
least upper bound, which we call c. We claim /(c) = 0. 

Since / is continuous, for any c > 0, there exists S > 0 such that when 
|x - c| < 6, then |/(x) - /(c) | < c. Therefore, if /(c) > 0, we can set e = /(c), 
and there exists S > 0 such that if \x - c| < <*, then \f(x) - /(c) | < /(c). In 
particular, we see that if x > c - 6/2 , f(x) > 0, so c - 6/2 is also an upper 
bound for X, which is a contradiction. 

If /(c) < 0, a similar argument shows that there exists 6 > 0 such that 
/(c + 6/2) < 0, contradicting the assumption that c is an upper bound for X . 
The only choice left is /(c) = 0. □ 


0.5 Infinite Sets and Russell’s Paradox 

One reason set theory is accorded so much importance is that Georg Cantor 
(1845-1918) discovered that two infinite sets need not have the same “number” 
of elements; there isn t just one infinity. You might think this is just obvious, 
for instance because there are more whole numbers than even whole numbers. 
But with the definition Cantor gave, two sets A and B have the same number of 
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elements (the same cardinality ) if you can set up a one-to-one correspondence 
between them. For instance 

0 1 2 3 4 5 6 • 0.5.1 

0 2 4 6 8 10 12 ... 

establishes a one to one correspondence between the natural numbers and the 
even natural numbers. More generally, any set whose elements you can list has 
the same cardinality as N. But Cantor discovered that R does not have the 
same cardinality as N: it has a bigger infinity of elements. Indeed, imagine 
making any infinite list of real numbers, say between 0 and 1, so that written 
as decimals, your list might look like 

.154362786453429823763490652367347548757 . . . 

.987354621943756598673562940657349327658. . . 

.229573521903564355423035465523390080742 . . . 

0.5.2 

.104752018746267653209365723689076565787. . . 
.026328560082356835654432879897652377327 . . . 


This argument simply flabber- 
gasted the mathematical world; 
after thousands of years of philo- 
sophical speculation about the in- 
finite, Cantor found a fundamen- 
tal notion that had been com- 
pletely overlooked. 

It would seem likely that R and 
R 2 have different infinities of ele- 
ments, but that is not the case (see 
Exercise 0.4.5). 


Now consider the decimal formed by the elements of the diagonal digits (in bold 
above) .18972 . . . , and modify it (almost any way you want) so that every digit 
is changed, for instance according to the rule “change 7’s to 5’s and change 
anything that is not a 7 to a 7”: in this case, your number becomes .77757 — 
Clearly this last number does not appear in your list; it is not the nth element 
of the list, because it doesn’t have the same nth decimal. t 

Sets that can be put in one-to-one correspondence with the integers are called 
countable , other infinite sets are called uncountable ; the set R of real numbers 
is uncountable. 

All sorts of questions naturally arise from this proof: are there other infinities 
besides those of N and R? (There are; Cantor showed that there are infinitely 
many of them.) Are there infinite subsets of R that cannot be put into one to 
one correspondence with either R or Z? This statement is called the continuum 
hypothesis , and has been shown to be unsolvable: it is consistent with the other 
axioms of set theory to assume it is true (Godel, 1938) or false (Cohen, 1965). 
This means that if there is a contradiction in set theory assuming the continuum 
hypothesis, then there is a contradiction without assuming it, and if there is 
a contradiction in set theory assuming that the continuum hypothesis is false, 
then again there is a contradiction without assuming it is false. 


Russell’s paradox 

Soon after Cantor published his work on set theory, Bertrand Russell (1872- 
1970) wrote him a letter containing the following argument; 
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This paradox has a long his- 
tory, in various guises: the Greeks 
knew it as the paradox of the bar- 
ber, who lived on the island of Mi- 
los, and decided to shave all the 
men of the island who did not 
shave themselves. Does the bar- 
ber shave himself? 


Consider the set X of all sets that do not contain themselves. If X € X, 
then X does contain itself, so X £ X. But if X £ X, then X is a set which 
does not contain itself, so X € X. 

Russell’s paradox was (and remains) extremely perplexing: Cantor’s reaction 
was to answer that Russell had completely destroyed his work, showing that 
there is an inconsistency in set theory right at the foundation of the subject. 
History has been kinder, but Russell’s paradox has never been quite “resolved.” 
The “solution,” such as it is, is to say that the naive idea that any property 
defines a set is untenable, and that sets must be built up, allowing you to take 
subsets, unions, products, ... of sets already defined; moreover, to make the 
theory interesting, you must assume the existence of an infinite set. Set theory 
(still an active subject of research) consists of describing exactly the allowed 
construction procedures, and seeing what consequences can be derived. 


0.6 Complex Numbers 


Complex numbers (long consid- 
ered “impossible” numbers) were 
first used in 16th century Italy, 
as a crutch that made it possi- 
ble to find real roots of real cubic 
polynomials. But they turned out 
to have immense significance in 
many fields of mathematics, lead- 
ing John Stillwell to write in his 
Mathematics and Its History that 
“this resolution of the paradox of 
y/—l was so powerful, unexpected 
and beautiful that only the word 
'miracle’ seems adequate to de- 
scribe it.” 


Complex numbers are written a + bi, where a and b are real numbers, and 
addition and multiplication are defined in Equations 0.6.1 and 0.6.2. It follows 
from those rules that i = \f^l. 

The complex number a + ib is often plotted as the point € R 2 . The 

real number a is called the real part of a 4- ib, denoted Re (a + ib), and the real 
number b is called the imaginary part, denoted Im (a + ib). The reals R can be 
considered as a subset of the complex numbers C, by identifying a € R with 
a + t0 € C; such complex numbers are called “real,” as you might imagine. 
Real numbers are systematically identified with the real complex numbers, and 
a + iO is usually denoted simply a. 

Numbers of the form 0 + ib are called purely imaginary. What complex 
numbers, if any, are both real and purely imaginary? 1 If we plot a + ib as the 

point ( jj) € R 2 , what do the purely real numbers correspond to? The purely 
imaginary numbers? 2 

Arithmetic in C 

Complex numbers are added in the obvious way: 

(ai + i&i) + (fl2 + 162) = (fli + 03) 4- i(6i + 62). 0.6.1 

Thus the identification with R 2 preserves the operation of addition. 

The only complex number which is both real and purely imaginary is 0 = 0 + Of. 
The purely real numbers are all found on the x-axis, the purely imaginary numbers 
on the y-axis. 
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Equation 0.6.2 is not the only 
definition of multiplication one 
can imagine. For instance, we 
could define (ai -h i6i ) * (a2 *62 ) = 
(aia2) + 1(6162). But in that case, 
there would be lots of elements 
by which one could not divide, 
since the product of any purely 
real number and any purely imag- 
inary number would be 0: 

(ai + tO) * (0 + 162) = 0. 


What makes C interesting is that complex numbers can also be multiplied: 

(ax -I- ibi )( a 2 + 162) = ( aid 2 - 6162) -I- 2(0162 + 0261). 0.6.2 

This formula consists of multiplying a x + ib\ and <22 + 262 (treating i like the 
variable a: of a polynomial) to find 

(q,[ -f 261 ){&2 + 262) = CLxQ'2 + 2(0162 + O261 ) + i 2 (6162 ) 0.6.3 

and then setting i 2 = - 1 . 

Example 0.6.1 (Multiplying complex numbers). 


If the product of any two lion-zero 
numbers a and 0 is 0: aft = 0, 
then division by either is impossi- 
ble; if we try to divide by a, we 
arrive at the contradiction 0 = 0: 


= 2 



a 


These four properties, concern- 
ing addition, don’t depend on the 
special nature of complex num- 
bers; we can similarly define addi- 
tion for n-tuples of real numbers, 
and these rules will still be true. 


The multiplication in these five 
properties is of conrse the special 
multiplication of complex num- 
bers, defined in Equation 0.6.2. 
Multiplication can only be defined 
for pairs of real numbers. If we 
were to define a new kind of num- 
ber as the 3-tuple (a,6,c) there 
would be no way to multiply two 
such 3-tuples that satisfies these 
five requirements. 

There is a way to define mul- 
tiplication for 4-tuples that satis- 
fies all but commutativity, called 
Hamilton's quaternions. 


(a) (2 + i)(l -32) = (2 + 3) + i(l -6) = 5-5i (6) (1 + t) 2 = 2i. A 0.6.4 

Addition and multiplication of reals viewed as complex numbers coincides 
with ordinary addition and multiplication: 

(a + iO) + (6 + iO) = (a + 6) + 2 O (a + i0)(6 + iO) = ( ab ) + iO. 0.6.5 

Exercise 0.6.1 asks you to check the following nine rules, for z if Z 2 6 C: 


(1) (2l+22) + 23 = 2 l + (^2 + 23) 

(2) Zi + z 2 = z 2 + Z\ 

< (3) z + 0 = z 

(4) (a + 26) + (-a - 26) = 0 

k 

(5) (z\z 2 )z z = zi (z 2 z 3 ) 

(6) zxz 2 ~ z 2 zi 

(7) 1 z — z 

(8) (a + ib) - *^£55) = 

(9) zi (z 2 + 23 ) = zi z 2 + 2 1 23 


The complex conjugate 


Addition is associative. 

Addition is commutative. 

0 (i.e., the complex number 0 + 0 i) 
is an additive identity. 

(—a — ib) is the additive inverse 
of a + ib. 

Multiplication is associative. 
Multiplication is commutative. 

1 (i.e., the complex number 1 + Of) 
is a multiplicative identity. 

If 2 ^ 0, then 2 has a multiplicative 
inverse. 

Multiplication is distributive over 
addition. 


Definition 0 . 6.2 (Complex conjugate). The complex conjugate of the 
complex number 2 = a + i 6 is the number 2 = a - ib. 


Complex conjugation preserves all of arithmetic: 


z + w = z + w and zw = zw. 


0.6.6 




Figure 0.6.1. 


When multiplying two complex 
numbers, the absolute values are 
multiplied and the arguments (po- 
lar angles) are added. 
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The real numbers are the complex numbers z which are equal to their complex 
conjugates: 2 = 2, and the purely imaginary complex numbers are those which 
are the opposites of their complex conjugates: 2 = -2. 

There is a very useful way of writing the length of a complex number in 
terms of complex conjugates: If 2 = a + id, then 22 = a 2 + 6 2 . The number 

\z\ — \/fl 2 4- 6 s = yjtt 0.6.7 

is called the absolute value (or the modulus) of 2. Clearly, |a+z'6| is the distance 
from the origin to ( 5 ) • 


Complex numbers in polar coordinates 

Let 2 = a+ib ^ 0 be a complex number. Then the point ^ J ^ can be represented 
in polar coordinates as ( £ sin $ ) * w ^ ere 

r = y/a? + fr 2 = \z\, 0.6.8 

and # is an angle such that 

a b 

cos 6 = - and sin# = -, 0.6.9 

r r 

so that 


2 = r(cos# + isin#). 0.6.10 

The polar angle #, called the argument of 2, is determined by Equation 0.6.9 
up to addition of a multiple of 27r. 

The marvelous thing about this polar representation is that it gives a geo- 
metric representation of multiplication, as shown in Figure 0.6.1. 


Proposition 0.6.3 (Geometrical representation of multiplication of 
complex numbers). The modulus of the product ziz? Is the product of 
the moduli \zi\ I22I. 

The polar angle of the product is the sum of the polar angles #1, #3: 

(ri(cos#i-h*sin#i)^(r2(cos#2-fisin#2)) = rir2^coe(#i4-#2H*8in(#i4*#2)). 


Proof. Multiply out, and apply the addition rules of trigonometry: 

cos(#j 4- $2) = cos#! cos #2 — sin#i sin #2 
sin(#j + # 2 ) — sin#j cos #2 4“ cos #1 sin #2. D 


The following formula, known as de Moivre ’s formula, follows immediately: 
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U2* 



“5 

4 

r U 4 

Figure 0.6.2. 

The fifth roots of z form a reg- 
ular pentagon, with one vertex at 
polar angle 0/5, and the others ro- 
tated from that one by multiples of 
2*/5. 


Immense psychological difficul- 
ties had to be overcome before 
complex numbers were accepted 
as an integral part of mathemat- 
ics; when GauSvS came up with 
his proof of the fundamental the- 
orem of algebra, complex num- 
bers were still not sufficiently re- 
spectable that he could use them 
in his statement of the theorem 
(although the proof depends on 
them). 


Corollary 0.6.4 (De Moivre’s formula). If z = r(co$0 + tsin0), then 

z n = r n (cosn0 + tsinn0). 0.6.12 


De Moivre s formula itself has a very important consequence, showing that in 
the process of adding a square root of -1 to the real numbers, we have actually 
added all the roots of complex numbers one might hope for. 


Proposition 0.6.5. Every complex number 2 = r(cos0 + i sin0) with r ^ 0 
has n distinct complex nth roots , which are the numbers 


d/n ^ 


0 + 2kir . . 

cos b i sin 

n 


$ + 2kir\ 

, k = 0 71-1. 

n ] 


0.6.13. 


Note that r l/n stands for the positive real nth root of the positive number 
r. Figure 0.6.2 illustrates Proposition 0.6.5 for n — 5. 


Proof. All that needs to be checked is that 


(1) (r l / r 'Y' — r, which is true by definition; 

( 2 ) 


0 + 2for .0 + 2kn 

cosn = cos0 and sinn =sin0, 0.6.14 

n n 

which is true since n ^-- = 0 + 2A:7r, and sin and cos are periodic with 


period 2n; and 

(3) The numbers in Equation 0.6.13 are distinct, which is true since the 
polar angles do not differ by a multiple of 27 t. □ 


A great deal more is true: all polynomial equations with complex coefficients 
have all the roots one might hope for. This is the content of the fundamen- 
tal theorem of algebra, Theorem 1.6.10, proved by d’Alembert in 1746 and by 
Gauss around 1799. This milestone of mathematics followed by some 200 years 
the first introduction of complex numbers, about 1550, by several Italian math- 
ematicians who were trying to solve cubic equations. Their work represented 
the rebirth of mathematics in Europe after a long sleep, of over 15 centuries. 


Historical background: solving the cubic equation 

We will show that a cubic equation can be solved using formulas analogous to 
the formula 

—b ± y/b 2 — 4ac 
2a 

for the quadratic equation ax 2 4- bx + c - 0. 


0.6.15 
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Let us start with two examples; the explanation of the tricks will follow. 


Example 0.6.6 (Solving a cubic equation). Let us solve the equation 
x 3 + x + 1 = 0. First substitute x = u — 1/3 u, to get 


RM-s) 


oil 1 

+ 1 = u — u + — + u — — — + 1 — 0. 0.6.16 

3 u 27 u 3 3u 


0.6.17 


After simplification and multiplication by u 3 this becomes 

u € + u 3 - ~ = 0. 

27 

This is a quadratic equation for u 3 , which can be solved by formula 0.6.15, to 
yield 


«* = 1 f-l ± « 0.0358...,- 


1.0358. 


0.6.18 


Both of these numbers have real cube roots: approximately m ~ 0.3295 and 
U 2 ~ -1.0118. 

This allows us to find x = u - l/3u: 


x ~ui - -i- ~u 2 ~ » -0.6823. 

3ui 3 u 2 


A 0.6.19 


Here we see something bizarre: 
in Example 0.6.6, the polynomial 
has only one real root and we can 
find it using only real numbers, 
but in Example 0.6.7 there are 
three real roots, and we can't find 
any of them using only real num- 
bers. We will see below that it is 
always true that when Cardano’s 
formula is used, then if a real poly- 
nomial has one real root, we can 
always find it using only real num- 
bers, but if it has three real roots, 
we never can find any of them us- 
ing real numbers. 


Example 0.6.7. Let us solve the equation x 3 - 3x + 1 = 0. As we will explain 
below, the right substitution to make in this case is x — u + l/u, which leads 
to 


(tt + y -3(u+i) + l = 0. 0.6.20 

After multiplying out, canceling and multiplying by u 3 , this gives the quadratic 
equation 

u 6 + u 3 + l=0 with solutions v h2 = l±ly ^ = cos ~ ± tsin 0.6.21 

The cube roots of Vi (with positive imaginary part) are 

2tt 2tt 87t . . 8tt 147t 14tt 

cosy +isniy, cos y + ismy, cos— + 1 sin — . 0.6.22 

In all three cases, we have l/u = % so that u + l/u - 2R eu, leading to the 
three roots 


_ 2?r 

xj = 2 cos — as 1.532088, x 2 = 2 cos « -1.879385, 
y 9 

_ 14ir 

x 3 = 2 cos 


0.6.23 


9 


» 0.347296. A 
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Derivation of Cardano’s formulas 


The substitutions x = u - I/3u 
in Example 0.6.6 and x = u + 
1 ju in Example 0.6.7 were special 
cases. 

Eliminating the term in x 2 
means changing the roots so that 
their sum is 0: If the roots of a cu- 
bic polynomial are a i,a 2 , and < 23 , 
then we can write the polynomial 
as 

p = (x - rti)(x - a 2 )(x - a 3 ) 

= x J - (ai + a 2 + a 3 )x 2 
+ (aia 2 + aia 3 + a 2 a 3 )x 
— aia 2 a 3 . 

Thus eliminating the term in x 2 
means that ai + a 2 + a 3 = 0. We 
will use this to prove Proposition 
0.6.9. 


If we start with the equation x 3 + ax 2 4- bx 4- c = 0, we can eliminate the term 
in x 2 by setting x — y — a/3: the equation becomes 


2 , , a 2 , ab 2a 3 

y + py + q = o, where p = b - — and <7 = c - y + — . 


0.6.24 


Now set y = u - £ ; the equation y 3 + py 4- q = 0 then becomes 

u 6 + qu 3 -^= 0, 0.6.25 

which is a quadratic equation for ti 3 . 

Let v\ and v 2 be the two solutions of the quadratic equation v 2 + qv - 
and let ^i, 3 be the three cubic roots of Vi for ?' = 1,2. We now have 

apparently six roots for the equation x 3 4- px + q = 0: the numbers 

yi t j ~ tt i t j — ~ , i — 1,2; j — 1,2, 3. 0.6.26 

Exercise 0.6.2 asks you to show that — p/(3uij) is a cubic root of V2, and 
that we can renumber the cube roots of v 2 so that ~p/{3uij) = u 2 ,j. If that is 
done, we find that y X j — y 2s j for j = 1,2,3; this explains why the apparently 
six roots are really only three. 


The discriminant of the cubic 


Definition 0.6.8 (Discriminant of cubic equation). The number A = 
27 q 2 + 4p 3 is called the discriminant of the cubic equation x 3 + px + q. 

Proposition 0.6.9. The discriminant A vanishes exactly when x 3 +px+q = 0 
has a double root. 

Proof. If there is a double root, then the roots are necessarily {a, a, -2a} for 
some number a, since the sum of the roots is 0. Multiply out 

(x - a) 2 (x -f- 2a) = x 3 - 3a 2 x + 2a 3 , so p - -3a 2 and q = 2a 3 , 

and indeed 4p 3 -l- 27 q 2 = -4 * 27 a 6 + 4 • 27 a 6 = 0. 

Now we need to show that if the discriminant is 0, the polynomial has a 
double root. Suppose A = 0, and call a the square root of —p/3 such that 
2 q 3 = q\ such a square root exists since 4a 6 = 4(-p/3) 3 = -4p 3 /27 = q 2 . Now 
multiply out 

(x - a) 2 (x + 2a) = x 3 4- x(-4a 2 + a 2 ) + 2a 3 = x 3 + px + g, 
and we see that a is a double root of our cubic polynomial. D 
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Cardano’s formula for real polynomials 

Suppose p,q are real. Figure 0.6.3 should explain why equations with double 
roots are the boundary between equations with one real root and equations 
with three real roots. 

Proposition 0.0.10 (Number of real roots of a polynomial). The 
real cubic polynomial x 3 + px + q has three real roots if the discriminant 
27 q 2 -I- 4 p 3 < 0, and one real root if 27 q 2 4 p 8 > 0. 



Proof. If the polynomial has three real roots, then it has a positive maximum 
a t -^/Z p/3, and a negative minimum at \J-pf 3. In particular, p must be 
negative. Thus we must have 


After a bit of computation, this becomes the result we want: 


+ ? I <0. 0.6.27 




0.6.28 


Figure 0.6.3. 

The graphs of three cubic poly- 
nomials. The polynomial at the 
top has three roots. As it is varied, 
the two roots to the left coalesce to 
give a double root, as shown by the 
middle figure. If the polynomial 
is varied a bit further, tile double 
root vanishes (actually becoming a 
pair of complex conjugate roots). 


Thus indeed, if a real cubic polynomial has three real roots, and you want to 
find them by Cardano’s formula, you must use complex numbers, even though 
both the problem and the result involve only reals. Faced with this dilemma, 
the Italians of the 16th century, and their successors until about 1800, held 
their noses and computed with complex numbers. The name “imaginary” they 
used for such numbers expresses what they thought of them. 

Several cubics are proposed in the exercises, as well as an alternative to 
Cardano’s formula which applies to cubics with three real roots (Exercise 0.6.6), 
and a sketch of how to deal with quartic equations (Exercise 0.6.7) . 


0.7 Exercises 


Exercises for Section 0.4: 0.4.1 (a) Let x and y be two positive reals. Show that x + y is well defined 

Real Numbers by showing that for any k, the digit in the Arth position of [zj^r + is the 

same for all sufficiently large N. Note that N cannot depend just on fc, but 
must depend also on x and y. 
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Stars (*) denote difficult exer- 
cises. Two stars indicate a partic- 
ularly challenging exercise. 

Many of the exercises for Chap- 
ter 0 are quite theoretical, and 
too difficult for students taking 
multivariate calculus for the first 
time. They are intended for use 
when the hook is being used for a 
first analysis class. Exceptions in- 
clude Exercises 0.5.1 and part (a) 
of 0.5.2. 


*(b) Now drop the hypothesis that, the numbers are positive, and try to 
define addition. You will find that this is quite a bit harder than part (a). 

*(c) Show that addition is commutative. Again, this is a lot easier when the 
numbers are positive. 

**(d) Show that addition is associative, i.e., x + (y + z) = (x + y) + z. This 
is much harder, and requires separate consideration of the cases where each of 
x , y and z is positive and negative. 

0.4.2 Show that if two numbers are k - close for all k, then they are equal. 

*0.4.3 Show that the functions A(x,y) = x + y, M(x,y) = xy, S(x,y) = 
x — y, (x + y) + 2 are ID-continuous, and that 1/x is not. Notice that for A and 
S, the l of Definition 0.4.4 does not depend on N , but that it does for M. 


**0.4.4 Prove Proposition 0.4.6. This can be broken into the following steps. 

(a) Show that sup*. infi>/t /([xij/, . . . , [x n ]i) is well defined, i.e., that the sets 
of numbers involved are bounded. Looking at the function S from Exercise 
0.4.3, explain why both the sup and the inf are there. 

(b) Show that the function / has the required continuity properties. 

(c) Show the uniqueness. 


*0.4.5 Define division of reals, using the following steps. 

(a) Show that the algorithm of long division of a positive finite decimal a by 
a positive finite decimal 6 defines a repeating decimal a/6, and that b(a/b) = a. 

(b) Show that the function inv(x) defined for x > 0 by the formula 


inv(x) = inf l/Jx]* 
k 


. t . digit 
position 

0 

1 

even 

left 

right 

odd 

right 

left 


Table 0.4.6 


satisfies xinv(x) = 1 for all x > 0. 

(c) Now define the inverse for any x ^ 0, and show that x inv(x) = 1 for all 
x/0. 

**0.4.6 In this exercise we will construct a continuous mapping 7 : [0, 1] -* 
S 2 , the image of which is a (full) triangle T. We will write our numbers in [0, 1) 
in base 2, so such a number might be something like .0011101000011 . . . , and 
we will use Table 0.4.6. 

Take a right, triangle T. We will associate to a string s = si,s 2 , . . . of digits 
0 and 1 a sequence of points xo, xi , X 2 , . . . of T by starting at the right angle 
xo(s), dropping the perpendicular to the opposite side, landing at xi(s), and 
deciding to turn left or right according to the digit sj, as interpreted by the 
bottom line of the table, since this digit is the first digit (and therefore in an 
odd position): on 0 turn right and on 1 turn left. 

Now drop the perpendicular to the opposite side, landing at x 2 (s), and turn 
right or left according to the digit s 2 , as interpreted by the top line of the table, 
etc. 
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Figure 0.4.6. 

This sequence corresponds 
the string of digits 

00100010010 .... 


Exercises for Section 0.5 

Infinite Sets 
and RusselPs Paradox 


This construction is illustrated in Figure 0.4.6. 

(a) Show that for any string of digits (s), the sequence x„(s) converges. 

(b) Suppose a number t € [0, 1] can be written in base 2 in two different 
ways (one ending in 0’s and the other in l’s), and call (s), (s') the two strings 
of digits. Show that 

\ lim x n (s) = lim Xn(s'). 

\ n— »oo n— »oo 

Hint: Construct the sequences associated to .1000 . . . and .0111 — 

This allows us to define 7 (f) = lim n ^oo x n (s). 
to (c) Show that 7 is continuous. 

(d) Show that every point in T is in the image of 7. What is the maximum 
number of distinct numbers such that 7(^1) = ••• = 7 (t*)? Hint: 

Choose a point in T, and draw a path of the sort above which leads to it. 


0.4.7 (a) Show that the function 



is not continuous. 


if x ^ 0 
if x = 0 


(b) Show that / satisfies the conclusion of the intermediate value theorem: 
if f(x 1) = ai and f(x 2 ) = o 2 , then for any number a between a\ and 02, there 
exists a number x between x\ and x 2 such that f(x) = a. 


0.5.1 (a) Show that the set of rational numbers is countable, i.e., that you 

can list all rational numbers. 

(b) Show that the set D> of finite decimals is countable. 

0.5.2 (a) Show that the open interval (—1, 1) has the same infinity of points 

as the reals. Hint: Consider the function g(x) = tan(?rx/2). 

*(b) Show that the closed interval [—1, 1] has the same infinity of points as 
the reals. For some reason, this is much trickier than (a). Hint: Choose two 

sequences, (1) Oo = l,ai,02,...; and (2) 60 = “ l»&i>&2»... and consider the 
map 

g(x) = x if x is not in either sequence. 
p(^n) = fl-n+1- 
<?(&n) = &n+l- 

*(c) Show that the points of the circle 

{(w) 6 R 2 l * 2 + «' 2 = i} 

have the same infinity of elements as R. Hint: Again, try to choose an appro- 
priate sequence. 
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Exercise 0.5.4, part (h): This 
proof, due to Cantor, proves that 
transcendental numbers exist 
without exhibiting a single one. 
Many contemporaries of Cantor 
were scandalized, largely for this 
reason. 


Exercise 0.5.5 is the one-dimen- 
sional case of the celebrated Brou- 
wer fixed point theorem , to be dis- 
cussed in a subsequent volume. In 
dimension one it is an easy con- 
sequence of the intermediate value 
theorem, but in higher dimensions 
(even two) it is quite a delicate re- 
sult. 

Exercises for Section 0.6: 

Complex Numbers 


*(d) Show that IR 2 has the same infinity of elements as 1R. 

*0.5.3 Is it possible to make a list of the rationals in [0. 1), written as deci- 
mals. so that the entries on the diagonal also give a rational number? 

*0.5.4 An algebraic number is a root of a polynomial equation with integer 
coefficients: for instance, the rational number p/q is algebraic, since it is a 
solution of qx - p — 0, and so is \/2. since it is a root of x 2 -2 = 0. A number 
that is not algebraic is called transcendental. It isn't obvious that there are any 
transcendental numbers; the following exercise gives a (highly unsatisfactory) 
proof for their existence. 

(a) Show that the set of all algebraic numbers is countable^ Hint: List the 
finite collection of all roots of linear polynomials with coefficients with absolute 
value < 1. Then list the roots of all quadratic equations with coefficients < 2 
(v^hich will include the linear equations, for instance ().r 2 + 2x - 1 =0), then 
all roots of cubic equation with coefficients < 3, etc. 

(b) Derive from part (a) that there exist transcendental numbers, in fact 
nnconntably many of them. 

0.5.5 Show that if / : [a. 6] — ► [a. b\ is continuous, there exists c G [a, 6] with 
m = c. 

0.5.6 Show that if p{. r) is a polynomial of odd degree with real coefficients, 
then there is a real number c such that /(c) = 0. 

0.6.1 Verify the nine rules for addition and multiplication of complex num- 
bers. Statements (5) and (9) are the only ones that are not immediate. 


For Exercise 0.6.2, see the sub- 
section on the derivation of Car- 
dano’s formulas (Equation 0.6.26 
in particular). 


0.6.2 Show' that -pf{Zu\ j) is a cubic root of v 2 . and that we can renumber 
the cube roots of 1*2 so that -p/(3u\.j) = U 2 .j- 

0.6.3 (a) Find all the cubic roots of 1. 

(b) Find all the 4th roots of 1. 

*(c) Find all the 5th roots of 1. Use your formula to construct a regular 
pentagon using ruler and compass construction. 

(d) Find all the 6th roots of 1 . 


0.6.4 Show that the following cubics have exactly one real root, and find it. 

(a) x 3 - 18x + 35 = 0 

(b) x 3 + 3x 2 + x + 2 = 0 

0.6.5 Show that the polynomial x 3 — 7x + 6 has three real roots, and find 
them. 
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In Exercise 0.6.6. part (a), use 
de Moivre’s formula: 

cos nO 4- i sin n6 = (cos 0 + i sin 0) ” . 


Exercise 0.6.7 uses results from 
Section 3.1. 


Figure 0.6.7(a). 

The two parabolas of Equation 
0.7.1: note that their axes are re- 
spectively the y-axis and the pr- 
axis. 



Figure 0.6.7(b). 

The three pairs of lines that 
go through the intersections of the 
two parabolas. 


0.6.6 There is a way of finding the roots of real cubics with three real roots, 
using only real numbers and a bit of trigonometry. 

(a) Prove the formula 4cos 3 0 — 3cos0— cos 30 = 0 . 

(b) Set y = ax in the equation x 3 + px + q = 0, and show that there is a 
value of a for which the equation becomes 4 y :i - 3y - q\ =0; find the value of 
a and of q \ . 

(c) Show that there exists an angle 0 such that 30 = q\ precisely when 
27q 2 -f 4p 3 < 0, i.e.. precisely when the original polynomial has three real roots. 

(d) Find a formula (involving arccos) for all three roots of a real cubic poly- 
nomial with three real roots. 

*0.6.7 In this exercise, we will find formulas for the solution of 4th degree 
polynomials, known as quartics. Let w 4 4- aw :i 4- bvr 4- cw 4- d be a quart ic 
polynomial. 

(a) Show that if we set w = x - a/ 4, the quartic equation becomes 

x 4 4- px 2 + qx 4- r — 0, 

and compute p, q and r in terms of a. b , c. d. 

(b) Now set y = x 2 4- p/2, and show that solving the quartic is equivalent to 
finding the intersections of the parabolas Ti and r 2 of equation 

V 2 

x 2 - y 4- p/2 = 0 and y 2 4- qx 4- r — — = 0 

respectively, pictured irf Figure 0.6.7(A). 

The parabolas Tj and T 2 intersect (usually) in four points, and the curves 
of equation 

fm (y) “ • x>2 ~y + P/ 2 + m (y Z +qx + r - ^ = 0 0.7.1 

are exactly the curves given by quadratic equations which pass through those 
four points; some of these curves are shown in Figure 0.6.7 (C). 

(c) What can you say about the curve given by Equation 0.7.1 when in — 1? 
When m is negative? When m is positive? 

(d) The assertion in (b) is not quite correct: there .is one curve that passes 
through those four points, and which is given by a quadratic equation, that is 
missing from the family given by Equation 0.7.1. Find it. 

(e) The next step is the really clever part of t he solution. Among those' curves, 
there are three, shown in Figure 0.6.7(B), that consist of a pair of lines, i.e., 
each such “degenerate” curve consists of a pair of diagonals of the quadrilateral 
formed by the intersection points of the parabolas. Since there are three of 
these, we may hope that the corresponding values of m are solutions of a cubic 
equation, and this is indeed the case. Using Ihe fact that a pair of lines is not 
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a smooth curve near the point where they intersect, show that the numbers 
m for which the equation f, n = 0 defines a pair of lines, $nd the coordinates 
x,y of the point where they intersect, are the solutions ofcthe system of three 
equations in three unknowns, 

p 2 

y 2 + qx + T - — + m(x 2 - y - p/2) ~ 0 
4 » 

2 y - m = 0 
q + 2 mx — 0. 

(f) Expressing x and y in* terms of m using the last two equations, show that 
m satisfies the equation 

m 3 - 2pm 2 + (p 2 — 4r)m + q 2 = 0 

for m; this equation is called the resolvent cubic of the original quartic equation. 



FIGURE 0.6.7 (3). The curves f m ^ = x 2 - y + p/2 + m (y 2 + q:v -4- r — = 0 

for seven different values of m. 

Let mi, m 2 and m 3 be the roots of the equation, and let ) and 

(^3 ) corresponding points of intersection of the diagonals. This doesn't 

quite give the equations of the lines forming the two diagonals. The next part 
gives a way of finding them. 
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(g) Let ^ ^ be one of the points of intersection, as above, and consider the 

line Ik through the point ^ ^ with slope k, of equation 

y ~ V\ = k(x- xi). 

Show that the values of k for which is a diagonal are also the values for which 
the restrictions of the two quadratic functions y 2 + qx + r — and x 2 - y - p/2 
to Ik are proportional. Show that this gives the equations 

1 _ -k _ kx\ - y\ + p/2 

k 2 2k(-kxi + yi) + q (kx\ - 2/1 ) 2 - p 2 /4 + r ? 

which can be reduced to the single quadratic equation 

k 2 (x 2 - yi + a/2) = y 2 + bxi - a 2 / 4 + c. 

Now the full solution is at hand: compute (mi, X), 2 / 1 ) and (m 2 , £ 2 , 2 / 2 ); y° u 

can igrtore the third root of the resolvent cubic or use it to check your an- 
swers. Then for each of these compute the slopes k iA and k K 2 = — from the 

equation above. Yon now’ have four lines, two through A and two through B. 
Intersect them in pairs to find the four intersections of the parabolas. 

(h) Solve the quartic equations 

x 1 - 4x 2 + x + 1 = 0 and x' 1 + 4x 3 + x - 1 = 0. 
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Vectors, Matrices, and Derivatives 


It is sometimes said that the great discovery of the nineteenth century was 
that the equations of nature were linear , and the great discovery of the 
twentieth century is that they are not . — Tom Kdrner, Fourier Analysis 


1.0 Introduction 

In this chapter, we introduce the principal actors of linear algebra and multi- 
variable calculus. 

By and large, first year calculus deals with functions / that associate one 
number f(x) to one number x. In most realistic situations, this is inadequate: 
the description of most systems depends on many functions of many variables. 

In physics, a gas might be described by pressure and temperature as a func- 
tion of position and time, two functions of four variables. In biology, one might 
be interested in numbers of sharks and sardines as functions of position and 
time; a famous study of sharks and sardines in the Adriatic, described in The 
Mathematics of the Struggle for Life by Vito Volterra, founded the subject of 
mathematical ecology. 

In micro-economics, a company might be interested in production as a func- 
tion of input, where that function has as many coordinates as the number of 
products the company makes, each depending on as many inputs as the com- 
pany uses. Even thinking of the variables needed to describe a macro-economic 
model is daunting (although economists and the government base many deci- 
sions on such models). The examples are endless and found in every branch of 
science and social science. 

Mathematically, all such things are represented by functions f that take n 
numbers and return m numbers; such functions are denoted f : W 1 — *• IR m . In 
that generality, there isn’t much to say; we must impose restrictions on the 
functions we will consider before any theory can be elaborated. 

The strongest requirement one can make is that f should be linear, roughly 
speaking, a function is linear if when you double the input, you double the 
output. Such linear functions are fairly easy to describe completely, and a 
thorough understanding of their behavior is the foundation for everything else. 

The first four sections of this chapter are devoted to laying the foundations of 
linear algebra. We will introduce the main actors, vectors and matrices, relate 
them to the notion of function (which we will call transformation ), and develop 
the geometrical language (length of a vector, length of a matrix, . . . ) that we 
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The notion that one can think 
about and manipulate higher di- 
mensional spaces by considering a 
point in n-dimensional space as a 
list of its n “coordinates” did not 
always appear as obvious to math- 
ematicians as it does today. In 
1846, the English mathematician 
Arthur Cayley pointed out that a 
point with four coordinates can be 
interpreted geometrically without 
recourse to “any metaphysical no- 
tion concerning the possibility of 
four-dimensional space.” 


“VoP denotes the number of 
shares traded, “High” and “Low,” 
the highest and lowest price paid 
per share, “Close,” the price when 
trading stopped at the end of the 
day, and “Chg,” the difference be- 
tween the closing price and the 
closing price on the previous day. 
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will need in multi-variable calculus. In Section 1.5 we will discuss sequences, 
subsequences, limits and convergence. In Section 1.6 we will expand on that 
discussion, developing the topology needed for a rigorous treatment of calculus. 

Most functions are not linear, but very often they are well approximated by 
linear functions, at least for some values of the variables. For instance, as long 
as there are few hares, their number may well double every year, but as soon 
as they become numerous, they will compete with each other, and their rate of 
increase (or decrease) will become more complex. In the last three sections of 
this chapter we will begin exploring how to approximate a nonlinear function 
by a linear function — specifically, by its higher-dimensional derivative. 

.1 Introducing the Actors: Vectors 

Much of linear algebra and multivariate calculus takes place within R n . This 
is the space of ordered lists of n real numbers. 

You are probably used to thinking of a point in the plane in terms of its two 
coordinates: the familiar Cartesian plane with its x, y axes is R 2 . Similarly, a 
point in space (after choosing axes) is specified by its three coordinates: Carte- 
sian space is R 3 . Analogously, a point in R n is specified by its n coordinates; 
it is a list of n real numbers. Such ordered lists occur everywhere, from grades 
on a transcript to prices on the stock exchange. 

Seen this way, higher dimensions are no more complicated than R 2 and R 3 ; 
the lists of coordinates just get longer. But it is not obvious how to think about 
such spaces geometrically. Even the experts understand such objects only by 
educated analogy to objects in R 2 or R 3 ; the authors cannot “visualize R 4 ” and 
we believe that no one really cam. The object of linear algebra is at least in pant 
to extend to higher dimensions the geometric language and intuition we have 
concerning the plame amd space, familiar to us adl from everyday experience. It 
will enable us to speak for instamce of the “sparce of solutions” of a panticulan 
system of equations as being a four-dimensional subspace of R 7 . 

Example 1,1.1 (The stock market). The following data is from the Ithaca 
Journal, Dec. 14, 1996. 

Local Nyse Stocks 



Vol 

High 

Low 

Close 

Chg 

Airgas 

193 

241/2 

231/s 

23Vs 

- 3 /s 

AT&T 

36606 

391/4 

383/g 

39 

3 /8 

Borg Wanner 

74 

383/g 

38 

38 

- 3 /s 

Corning 

4575 

443/4 

43 

44 '/. 

V 2 

Dow Jones 

1606 

331/4 

321/2 

331/4 

Vs 

Eastman Kodak 

7774 

805/s 

791/4 

793/ 8 

- 3 /4 

Emerson Elec. 

3335 

973/ 8 

955/s 

955/g . 

-n / 8 

Federal Express 

5828 

421/2 

41 

415/s 

1V2 
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Each of these lists of eight num- 
bers is an element of 3£ 8 ; if we were 
listing the full New York Stock Ex- 
change, they would be elements of 

jj£33S6 


The Swiss mathematician Leon- 
hard Euler (1707-1783) touched on 
all aspects of the mathematics and 
physics of his time. He wrote text- 
books on algebra, trigonometry, 
and infinitesimal calculus; all texts 
in these fields are in some sense 
rewrites of Euler’s. He set the no- 
tation we use from high school on: 
sin, cos, and tan for the trigono- 
metric functions, f{x) to indicate 
a function of the variable x are 
all due to him. Euler’s complete 
works fill 85 large volumes — more 
than the number of mystery nov- 
els published by Agatha Christie; 
some were written after he became 
completely blind in 1771. Euler 
spent much of his professional life 
in St. Petersburg. He and his 
wife had thirteen children, five of 
whom survived to adulthood. 


We can think of this table as five columns, each an element of M 8 : 


- 193 * 


/24‘M 


/23i/s\ 

36606 


391/4 


38 3/ 8 

74 


383/8 


38 

4575 

1606 

High = 

443/4 

331/4 

Low = 

43 

321/2 

7774 


80 5/s 


791/4 

3335 


973/8 


955/s 

. 5828 . 


V 421/2/ 

V 41 / 


Close = 


/23 5 /s\ 

39 


1 

1 

03 .03 
00 " »" 

- -< 

38 


“ 3 /8 

441/, 

331/4 

Chg = 

V2 

Vs 

793/s 


-3/4 

955/s 


-lVs 

V 41 5/8 / 


L 1V2J 


A 


Note that we write elements of R n as columns , not rows. The reason for 
preferring columns will become clear later: we want the order of terms in matrix 
multiplication to be consistent with the notation /(x), where the function is 
placed before the variable — notation established by the famous mathematician 
Euler. Note also that we use parentheses for “positional” data and brackets for 
“incremental” data; the distinction is discussed below. 


Points and vectors: positional data versus incremental data 

An element of K n is simply an ordered list of n numbers, but such a list can 
be interpreted in two ways: as a point representing a position or as a vector 
representing a displacement or increment. 


Definition 1.1.2 (Point, vector, and coordinates). The element of Bt n 
with coordinates xi, X 2 , • • • , x n can be interpreted in two ways: as the point 



or as the vector St — 


*\ 


Lx„J 


, which represents an increment. 


Example 1.1.3 (An element of l 2 as a point and as a vector). The 
element of IR 2 with coordinates x = 2, y = 3 can be interpreted as the point 

in plane, as shown in Figure 1.1.1. But it can also be interpreted as 

the instructions “start anywhere and go two units right and three units up,” 
rather like instructions for a treasure hunt: “take two giant steps to the east, 
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3 


- 


4 - 

2 


Figure 1.1.1. 


The point 





/ 

/ 


/ 



Figure 1.1.2. 

All the arrows represent the 
same vector, 



As shown in Figure 1.1.2, in the 
plane (and in three-dimensional 
space) a vector can be depicted as 
an arrow pointing in the direction 
of the displacement. The amount 
of displacement is the length of the 
arrow. This does not extend well 
to higher dimensions. How are we 
to picture the “arrow” in R 3356 
representing the change in prices 
on the stock market? How long is 
it, and in what "direction” does it 
point? We will show how to com- 
pute these magnitudes and direc- 
tions for vectors in R n in Section 
1.4. 


and three to the north”; this is shown in Figure 1.1.2. Here we are interested in 

, T2l 

the displacement: if we start at any point and travel 


, how far will we have 


gone, in what direction? When we interpret an element of R n as a position, we 
call it a point ; when we interpret it as a displacement, or increment, we call it 
a vector. A 


Example 1.1.4 (A point as a state of a system). It is easy to think of a 
point in R 2 or R 3 as a position; in higher dimensions, it can be more helpful to 
think of a point as a “state” of a system. If 3356 stocks are listed on the New 
York Stock Exchange, the list of closing prices for those stocks is an element 
of R 3356 , and every element of R 3356 is one theoretically possible state of the 
stock market. This corresponds to thinking of an element of R 3366 as a point. 

The list telling how much each stock gained or lost compared with the pre- 
vious day is also an element of R 3356 , but this corresponds to thinking of the 
element as a vector, with direction and magnitude: did the price of each stock 
go up or down? How much? A 

Remark. In physics textbooks and some first year calculus books, vectors are 
often said to represent quantities (velocity, forces) that have both “magnitude” 
and “direction,” while other quantities (length, mass, volume, temperature) 
have only “magnitude” and are represented by numbers (scalars). We think 
this focuses on the wrong distinction, suggesting that some quantities are always 
represented by vectors while others never are, and that it takes more information 
to specify a quantity with direction than one without. 

The volume of a balloon is a single number, but so is the vector expressing 
the difference in volume between an inflated balloon and one that has popped. 
The first is a number in R, while the second is a vector in R. The height of 
a child is a single number, but so is the vector expressing how much he has 
grown since his last birthday. A temperature can be a “magnitude,” as in “It 
got down to -20 last night,” but it can also have “magnitude and direction,” as 
in “It is 10 degrees colder today than yesterday.” Nor can “static” information 
always be expressed by a single number: the state of the Stock Market at a 
given instant requires as many numbers as there are stocks listed — as does the 
vector describing the change in the Stock Market from one day to the next. 
A 


Points can’t be added; vectors can 

As a rule, it doesn’t make sense to add points together, any more than it makes 
sense to “add” the positions “Boston” and “New York” or the temperatures 50 
degrees Fahrenheit and 70 degrees Fahrenheit. (If you opened a door between 
two rooms at those temperatures, the result would not be two rooms at 120 
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We will not consistently use 
different notation for the point 
zero and the zero vector, although 
philosophically the two are quite 
different. The zero vector, i.e., the 
“zero increment,” has a universal 
meaning, the same regardless of 
the frame of reference. The point 
zero is arbitrary, just as “zero de- 
grees” is arbitrary, and has a dif- 
ferent meaning in the Centigrade 
system and in Fahrenheit. 

Sometimes, often at a key point 
in the proof of a hard theorem, 
we will suddenly start thinking of 
points as vectors, or vice versa; 
this happens in the proof of Kan- 
torovitch’s theorem in Appendix 
A.2, for example. 



Figure 1.1.3. 


The difference a — b between 
point a and point b is the vector 
joining them. The difference can 
be computed by subtracting the 
coordinates of b from those of a. 


degrees!) But it does make sense to measure the difference between points (i.e., 
to subtract them): you can talk about the distance between Boston and New 
York, or about the difference in temperature between two rooms. The result 
of subtracting one point from another is thus a vector specifying the increment 
you need to add to get from one point to another. 

You can also add increments (vectors) together, giving another increment. 
For instance the vectors “advance five meters east then take two giant steps 
south” and “take three giant steps north and go seven meters west” can be 
added, to get “advance 2 meters west and one giant step north.” 

Similarly, in the NYSE table in Example 1.1.1, adding the Close columns on 
two successive days does not produce a meaningful answer. But adding the Chg 
columns for each day of a week produces a perfectly meaningful increment: the 
change in the market over that week. It is also meaningful to add increments 
to points (giving a point): adding a Chg column to the previous day’s Close 
column produces the current day’s Close — the new state of the system. 


To help distinguish these two kinds of elements of R n , we will denote them 
differently: points will be denoted by boldface lower case letters, and vectors 
will be lower case boldface letters with arrows above them. Thus x is a point 
in R 2 , while x is a vector in R 2 . We do not distinguish between entries of 
points and entries of vectors; they are all written in plain type, with subscripts. 


However, when we write elements of R n as columns, we will use parentheses for 


a point x and square brackets for a vector x: in R 2 , x = 



and x = 



Remark. An element of R n is an element of R n — i.e., an ordered list of 
numbers — whether it is interpreted as a point or as a vector. But we have very 
different images of points and vectors, and we hope that sharing them with you 
explicitly will help you build a sound intuition. In linear algebra, you should 
just think of elements of R n as vectors. However, differential calculus is all 
about increments to points. It is because the increments are vectors that linear 
algebra is a prerequisite for multivariate calculus: it provides the right language 
and tools for discussing these increments. 


Subtraction and addition of vectors and points 


The difference between point a and point b is the vector a - b, as shown in 
Figure 1.1.3. 

Vectors are added by adding the corresponding coordinates: 
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If we were working with com- 
plex vector spaces, our scalars 
would be complex numbers; in 
number theory, scalars might be 
the rational numbers; in coding 
theory, they might be elements of 
a finite field. (You may have run 
into such things under the name of 
“clock arithmetic.") We use the 
word “scalar" rather than “real 
number" because most theorems 
in linear algebra are just as true 
for complex vector spaces or ratio- 
nal vector spaces as for real ones, 
and we don’t want to restrict the 
validity of the statements unnec- 
essarily. 


The symbol G means “element 
of." Out loud, one says “in.” The 
expression “x, y G V" means “x G 
V and y G V." If you are unfamil- 
iar with the notation of set theory, 
see the discussion in Section 0.3. 


the result is a vector. Similarly, vectors are subtracted by subtracting the 
corresponding coordinates to get a new vector. A point and a vector are added 
by adding the corresponding coordinates; the result is a point. 

In the plane, the sum v + w is the diagonal of the parallelogram of which 
two adjacent sides are v and w, as shown in Figure 1.1.4 (left). We can also 
add vectors by placing the beginning of one vector at the end of the other, as 
shown in Figure 1.1.4 (right). 




FIGURE 1.1.4. In the plane, the sum v + w is the diagonal of the parallelogram at 
left. We can also add them by putting them head to tail. 


Multiplying vectors by scalars 

Multiplication of a vector by a scalar is straightforward: 


a 

'x r ’ 

= 

’ ax\ * 

; for example, \/3 

3' 

-1 


1 

l 

►-* CO 

col col 


- *Tn - 


. CL 30 m 


2 


2\/3 J 


In this book, our vectors will be lists of real numbers, so that our scalars — 
the kinds of numbers we are allowed to multiply vectors or matrices by — are 
real numbers. 

Subspaces of nfc" 

A subspace of St" is a subset of IR n that is closed under addition and multipli- 
cation by scalars. 1 (This IR n should be thought of as made up of vectors, not 
points.) 


Mn Section 2.6 we will discuss abstract vector spaces. These are sets in which 
one can add and multiply by scalars, and where these operations satisfy rules (ten of 
them) that make them clones of lR n . Subspaces of M n will be our main examples of 
vector spaces. 
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Definition 1.1.5 (Subspace of P n ). A non-empty subset V 6 R n is called 
a subspace if it is closed under addition and closed under multiplication by 
scalars; i.e., V is a subspace if when 

x, y € V, and a 6 F, then x + y 6 V and aH 6 V. 


To be closed under multiplica- 
tion a subspace must contain the 
zero vector, so that 

0 v = 0. 


For example, a straight line through the origin is a snbspace of F 2 and of P 3 . 
A plane through the origin is a subspace of K 3 . The set consisting of just the 
zero vector {0} is a subspace of any R n , and F" is a subspace of itself. These 
last two, {6} and P r \ are considered trivial subspaces. 

Intuitively, it is clear that a line that is a subspace has dimension 1, and 
a plane that is a subspace has dimension 2. Being precise about what this 
means requires some “machinery” (mainly the notions of linear independence 
and span), introduced in Section 2.4. 


The standard basis vectors 


The notation for the standard 
basis vectors is ambiguous; at 
right we have three different vec- 
tors, all denoted ei . The subscript 
tells us which entry is 1 but does 
not say how many entries the vec- 
tor has- i.e., whether it is a vector 
in F 2 , P 3 or what. 


The standard basis vectors in 
F 2 and F 3 are often denoted i , j , 
and k : 


i 


©i 


j 


e 2 



or 


or 


V 
0 ; 
0 ^ 

'O' 

1 ; 
i 0 


k: 


ea 


0 

0 

1 


We do not use this notation but 
mention it in case you encounter 
it elsewhere. 


We will meet one particular family of vectors in F n often: the standard basis 
vectors. In P 2 there are two standard basis vectors, ei and £ 2 ; in F 3 , there are 
three: 


in P 



1 


in F 3 : ej = 


r 

1 


0 


‘o' 

0 

' «2 — 

1 

, e 3 - 

0 

0 


0 


1 


Similarly, in F 5 there are five standard basis vectors: 


r 


“0- 


-O' 

0 


1 


0 

0 

, e 2 = 

0 

, . . . , £5 = 

0 

0 


0 


0 

.0. 


.0. 


.1. 


Definition 1.1.6 (Standard basis vectors). The standard basis vectors 
in F n are the vectors e, with n entries, the jth entry 1 and the others zero. 

Geometrically, there is a close connection between the standard basis vectors 
in F 2 and a choice of axes in the Euclidean plane. When in school you drew 
an x-axis and y-axis on a piece of paper and marked off units so that you could 
plot a point, you were identifying the plane with R 2 : each point on the plane 
corresponded to a pair of real numbers— its coordinates with respect to those 
axes. A set of axes providing such an identification must have an origin, and 
each axis must have a direction (so you know what is positive and what is 
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negative) and it must have units (so you know, for example, where x = 3 or 
y = 2 is). 

4 y Such axes need not be at right angles, and the units on one axis need not 

: v be the same as those on the other, as shown in Figure 1.1.5. However, the 

( identification is more useful if we choose the axes at right angles (orthogonal) 

r « and the units equal; the plane with such axes, generally labeled x and y , is 

1 1 2 ’ known as the Cartesian plane. We can think that ei measures one unit along 

x the x-axis, going to the right, and e2 measures one unit along the y-axis, going 
“up.” 

Figure 1.1.5. 

The point marked with a cir- Vector fieldg 
cle is the point | 2 I in this non- 

orthogonal coordinate system. Virtually all of physics deals with fields. The electric and magnetic fields of 

electromagnetism, the gravitational and other force fields of mechanics, the 
velocity fields of fluid flow, the wave function of quantum mechanics, are all 
“fields.” Fields are also used in other subjects, epidemiology and population 
studies, for instance. 

By “field” we mean data that varies from point to point. Some fields, like 
temperature or pressure distribution, are scalar fields: they associate a number 
to every point. Some fields, like the Newtonian gravitation field, are best mod- 
eled by vector fields, which associate a vector to every point. Others, like the 
electromagnetic field and charge distributions, are best modeled by form fields , 
discussed in Chapter 6. Still others, like the Einstein field of general relativity 
(a field of pseudo inner products), are none of the above. 

Definition 1.1,7 (Vector field). A vector field on R n is a function whose 
input is a point in R n and whose output is a vector (also in R n ) emanating 
from that point. 

We will distinguish between functions and vector fields by putting arrows on 
Figure 1.1.6. vector fields, as in F in Example 1.1.8. 



A vector field associates a vec- Example 1.1.8 (Vector fields In IR 2 ). The identity function in K 2 
tor to each point. Here we show 

the radial vector field f ( x \ = / x ^ 

p(x\ = r*i „ , ’\y) Kv) 

\y ) [ y J takes a P° int in R and returns the same point. But the vector field 


Vector fields generally are easier to 
depict when one scales the vectors 
down, as we have done above and 
in Figure 1.1.7. 


(;)-[; 


1.1.4 


takes a point in R 2 and assigns to it the vector corresponding to that point, as 
shown in Figure 1.1.6. To the point with coordinates (1, 1) it assigns the vector 

1 ; to the point with coordinates (4, 2) it assigns the vector ^ 

2 

a * 
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Actually, a vector field simply 
associates to each point a vector; 
how you imagine that vector is up 
to you. But it is always helpful 
to imagine each vector anchored 
at, or emanating from, the corre- 
sponding point. 


-2 

y 


Similarly, the vector field F ) = |^_ 

[ xjj 2 
x y _ 


, shown in Figure 1.1.7, takes 
A 


Vector fields are often used to describe the flow of fluids or gases: the vector 
assigned to each point gives the velocity and direction of the flow. For flows that 
don’t change over time ( steady-state flows), such a vector field gives a complete 
description. In more realistic cases where the flow is constantly changing, the 
vector field gives a snapshot of the flow at a given instant. Vector fields are also 
used to describe force fields such as electric fields or gravitational fields. 


1.2 Introducing the Actors: Matrices 


/ »! - 



Figure 1.1.7. 
The vector field 


When a matrix is described, 
height is given first, then width: 
an m x n matrix is m high and 
n wide. After struggling for years 
to remember which goes first, one 
of the authors hit on a mnemonic: 
first take the elevator, then walk 
down the hall. 


Probably no other area of mathematics has been applied in such numerous 
and diverse contexts as the theory of matrices. In mechanics, electro- 
magnetics, statistics , economics , operations research, the social sciences, 
and so on, the list of applications seems endless. By and large this is 
due to the utility of matrix structure and methodology in conceptualiz- 
ing sometimes complicated relationships and in the orderly processing of 
otherwise tedious algebraic calculations and numerical manipulations . — 
James Cochran, Applied Mathematics: Principles, Techniques, and Ap- 
plications 

The other central actor in linear algebra is the matrix. 

Definition 1.2.1 (Matrix). An m x n matrix is a rect angula r array of 
entries, m high and n wide. 

We use capital letters to denote matrices. Usually our matrices will be arrays 
of numbers, real or complex, but matrices can be arrays of polynomials, or of 
more general functions; a matrix can even be an array of other matrices. A 
vector v 6 IR m is an m x 1 matrix. 

Addition of matrices, and multiplication of a matrix by a scalar, work in the 
obvious way: 

Example 1.2.2 (Addition of matrices and multiplication by a s ca lar). 


n o 
2 -1 
4 2 


0 -3 

1 -2 
3 1 


1 -3 
3 -3 
7 3 


- j U :]-[j :] 


So far, it s not clear that matrices gain us anything. Why put numbers (or 
other entries) into a rectangular array? What do we gain by talking about the 
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How would you add the matri- 
ces 


1 2 5 
0 2 3. 


and 


1 2 
0 2 


7 


You can’t: matrices can be added 
only if they have the same height 
and same width. 


Matrices were introduced by 
Arthur Cayley, a lawyer who be- 
came a mathematician, in A Mem- 
oir on the Theory of Matrices, 
published in 1858. He denoted the 
multiplication of a 3 x 3 matrix by 


the vector 


y 


using the format 


z 


(a, 6, c 0 x,y,z) 

/ 

> , 
b" 


a', b', c' 


a 


n 


jt 


“ . . . when Werner Heisenberg 
discovered ‘matrix’ mechanics in 
1925, he didn’t know what a ma- 
trix was (Max Born had to tell 
him), and neither Heisenberg nor 
Born knew what to make of the 
appearance of matrices in the con- 
text of the atom.” — Manfred R. 
Schroeder, “Number Theory and 
the Real World,” Mathematical 
Intelligencer, Vol. 7, No. 4 


2x2 matrix 


a 

C 


b 

d 


rather than the point 


a 

b 

c 

d J 


e 1R 4 ? The answer is that the 


matrix format allows another operation to be performed: matrix multiplication. 
We will see in Section 1.3 that every linear transformation corresponds to mul- 
tiplication by a matrix. This is one reason matrix multiplication is a natural 
and important operation; other important applications of matrix multiplication 
are found in probability theory and graph theory. 


Matrix multiplication is best learned by example. The simplest way to mul- 
tiply A times B is to write B above and to the right of A. Then the product 
AB fits in the space to the right of A and below B, the i,jth entry of AB 
being the intersection of the ith row of A and the jth column of B, as shown in 
Example 1.2.3. Note that for AB to exist, the width of A must equal the height 
of B. The resulting matrix then has the height of A and the width of B. 


Example 1.2.3 (Matrix multiplication). The first entry of the product 
AB is obtained by multiplying, one by one, the entries of the first row of A by 
those of the first column of £, and adding these products together: in Equation 
1.2.1, (2 x l) + (-l x 3) = —1. The second entry is obtained by multiplying 
the first row of A by the second column of B: (2 x 4) + (-1 x 0) = 8. After 
multiplying the first row of A by all the columns of B, the process is repeated 
with the second row of A: (3 x 1) -f (2 x 3) = 9, and so on. 


{A}[B) = [AB] 


H[ 


B 

■ « 



/ 

1 4 -2'’ 

3 0 2 

AB 


'2 - 1 ' 

3 2 


-1 8 -6 

9 12 -2 


s 

L J 

* s 

• * 


A 


AB 


Given the matrices 

A = 


1.2.1 


1 0 1 D 

'0 1 


1 -1 ll 


‘1 o' 

[2 3 j B ~ 

0 1 

J 

c = 

1 0 -1 


2 2 

1 1 


what are the products AB, AC and CD ? Check your answers below. 2 Now 
compute BA. What do you notice? What if you try to compute CA ? 3 

y=[°o s) ; ac =H zi _;]< ™=\ o :!]• 

’Matrix multiplication is not commutative; BA = J* 3 1. which is not equal to 

fo 1] L 

— [0 5 ’ Although the product AC exists, you cannot compute CA. 


AB = 
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Below we state the formal definition of the process we’ve just described. If 


Definition 1.2.4 says nothing 
new, but it provides some prac- 
tice moving between the concrete 
(multiplying two particular matri- 
ces) and the symbolic (express- 
ing this operation so that it ap- 
plies to any two matrices of appro- 
priate dimensions, even if the en- 
tries are complex numbers or even 
functions, rather than real num- 
bers.) In linear algebra one is 
constantly moving from one form 
of representation (one “language” ) 
to another. For example, as we 
have seen, a point in IH" can be 
considered as a single entity, b, or 
as the ordered list of its coordi- 
nates; matrix A can be thought of 
as a single entity or as a rectangu- 
lar array of its entries. 

In Example 1.2.3, A is a 2 x 2 
matrix and B is a 2 x 3 matrix, so 
that n — 2, m = 2 and p = 3; the 
product C is then a 2 x 3 matrix. 
If we set i = 2 and j — 3, we see 
that the entry C 2.3 of the matrix C 
is 


the indices bother you, do refer to Figure 1.2.1. 


Definition 1.2.4 (Matrix multiplication). If A is an m x n matrix whose 
(i,j)th entry is a ltJ , and B is an n x p matrix whose (i, j)th entry is bi.j . 
then C = AB is the m x p matrix with entries 


n 


C i,j ~~ 53 a i,kbk,j 


1 


1.2.2 


✓ 


m 




n 


'l,j + OH, 2^2,] + * 

• • + a 

— 

P 

rr . — > 

^ , 

/ 1 
• 

1 1 

bsp- 


hi 

Ni 




jth 

col. 


n 


ith 

row 


C2.3 = **2,161,3 + **2,262.3 
= (3 • —2) -f (2 - 2) 

= -6 + 4 = -2. 

Using the format for matrix 
multiplication shown in Example 
1.2.3, the t, jth entry is the entry 
at the intersection of the ith row 
and jth column. 


FIGURE 1.2.1. The entry cij of tile matrix C = AB is tile sum of t he products of 
the entries of the a,,jt of the matrix A and the corresponding entry 6*. , of t he matrix 
B. The entries a*,* are all in the ith row of A; the first index i is constant . and the 
second index k varies. The eutries bk, 3 are all in the jth column of B\ the first index 
k varies, and the second index j is constant. Since the width of A equals the height 
of B, the entries of A and those of B can be paired up exactly. 


Remark. Often people write a problem in matrix multiplication in a row: 
[j4][B] = \AB]. The format shown in Example 1.2.3 avoids confusion: the 
product of the ith row of A ind the jth column of B lies at the intersection of 
that row and column. It also avoids recopying matrices when doing repeated 
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n" 

i 




1 B 

t 


.1 


■ 


! 

j' -A 

AB 


i 

i 

i 


Figure 1.2.2. 


The ith column of the product 
AB depends on all the entries of 
A but only the ith column of B. 



Figure 1.2.3. 


The jfth row of the product AB 
depends on all the entries of B but 
only the jrth row of A. 


multiplications, for example A times B times C times D: 


M 

c 

. 


D 

>] M 

(AB)C 


(ABC)D 


1.2.3 


Multiplying a matrix by a standard basis vector 

Observe that multiplying a matrix A by the standard basis vector e, selects out 
the ith column of A , as shown in the following example. We will use this fact 
often. 


Example 1.2.5 (The ith column of A is Ae*). Below, we show that the 
second column of A is Ae^: 


«2 


0 

1 

0 


'3 —2 0* 


- 2 • 

2 1 2 


1 

0 4 3 


4 

.1 0 2. 

V 

. 0. 

A 

Ac, 


multiplies the 2nd column by 1: 


-2* 


—2 * 

1 

x 1 = 

1 

4 

4 

. 0. 


. o. 


A 1.2.4 


Similarly, the ith column of AB is Ab*, where b; is the ith column of B, as 
shown in Example 1.2.6 and represented in Figure 1.2.2. The jth row of AB is 
the product of the jth row of A and the matrix B, as shown in Example 1.2.7 
and Figure 1.2.3. 


Example 1.2.6. The second column of the product AB is the same as the 
product of the second column of A and the matrix B : 




1.2.5 


Example 1.2.7. The second row of the product AB is the same as the product 
of the second row of A and the matrix B: 
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In his 1858 article on matrices, 
Cayley stated that matrix multi- 
plication is associative but gave no 
proof. The impression one gets is 
that he played around with ma- 
trices (mostly 2x2 and 3x3) 
to get some feeling for how they 
behave, without worrying about 
rigor. Concerning another matrix 
result (the Cayley-Hamilton theo- 
rem) he verifies it for 3 x 3 matri- 
ces, adding I have not thought it 
necessary to undertake the labour 
of a formal proof of the theorem in 
the general case of a matrix of any 
degree. 


B 


1 4 -2 

3 0 2 


'2 

-f 


-l 

8 

-6' 

3 

2 


9 

12 

-2 

L J 


• 


m 


(3 


A AB 

Matrix multiplication is associative 


B 

/ * ^ 

'14-2 

3 0 2 

J 

2] [ 9 12 -2] 


1.2.6 


When multiplying the matrices A,£, and C, we could set up the repeated 
multiplication as we did in Equation 1.2.3, which corresponds to the product 
(AB)C. We can use another format to get the product A(BC): 



P 


m 

m 

B 

q 

c 


A 

AB 


K 



i 



j 



1.2.7 


Is ( AB)C the same as (AB)C1 In Section 1.3 we give a conceptual reason why 
they are; here we give a computational proof. 


Figure 1.2.4. 

This way of writing the ma- 
trices corresponds to calculating 
(AB)C. 


q 



P P 

El 

□ 

m 

B 

E 

: 

A 

AO 

C). 


□ 



m 


Figure 1.2.5. 

This way of writing the ma- 
trices corresponds to calculating 
A(BC). 


Proposition 1.2.8 (Matrix multiplication is associative). If A is an 
n x m matrix, B is anmxp matrix and C is a p x q matrix , so that ( AB)C 
and A(BC) are both defined, then they are equal: 

(AB)C * A(BC). 1.2.8 


Proof. Figures 1.2.4 and 1.2.5 show that the i,jth entry of both A(BC) and 
(AB)C depend only on the ith line of A and the jth column of C (but on all the 
entries of £), and that without loss of generality we can assume that A is a line 
matrix and that C is a column matrix, i.e., that n = q = 1, so that both (AB)C 
and A(BC) are numbers. The proof is now an application of associativity of 
multiplication of numbers: 


(AB)C 


= E(x>mU 

1=1 \Jfc=l / 


Ith entry of AB 


P m m / p \ 

= a * (X) bk - ,ct ) = a (bc). 

i = i *= i *=i \/=i / 


1.2.9 


kth entry of BC 
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Non-commutativity of matrix multiplication 


Exercise 1.2.2 provides prac- 
tice on matrix multiplication. At 
the end of this section, Example 
1.2.22, involving graphs, shows a 
setting where matrix multiplica- 
tion is a natural and powerful tool. 


As we saw earlier, matrix multiplication is most definitely not commutative. It 
may well be possible to multiply A by B but not B by A. Even if both matrices 
have the same number of rows and columns, AB will usually not equal BA , as 
shown in Example 1.2.9. 


Example 1.2.9 (Matrix multiplication is not commutative). If you 


multiply the matrix 


0 

1 


by the matrix 


0 

1 


1 

0 


the answer you get will 


nd on which one 

L J L 

you put first: 

J 


0 r 




1 0 





L 

is not equal to 


0 1 


1 0 


0 1 

1 1 


1 1 


! 0 


0 

1 

1 

0 


A 1.2.10 


The identity matrix 


The main diagonal is also called 
the diagonal. The diagonal from 
bottom left to top right is the anti- 
diagonal. 


The identity matrix I plays the same role in matrix multiplication as the number 
1 does in multiplication of numbers: I A = A — AI. 

Definition 1.2.10 (Identity matrix). The identity matrix /„ is the nxn- 
matrix with l’s along the main diagonal (the diagonal from top left to bottom 
right) and 0’s elsewhere. 


Multiplication by the identity 
matrix I does not change the ma- 
trix being multiplied. 


The columns of the identity 
matrix /„ are of course the stan- 
dard basis vectors ei , . . . , e n : 


U 


10 0 0 
0 10 0 
0 0 10 
0 0 0 1 . 
©1 ©2 ©3 ©4 


For example, 


If A 





■1 

0 

0 

0- 

h = 

1 O' 
0 1 

and I 4 — 

0 

0 

1 

0 

0 

1 

0 

0 




.0 

0 

0 

1. 

is an n x m-matrix, then 





II 

II 

or, 

more precisely, 


l n A 

Si 

Aim 



1.2.11 


1 . 2.12 


since if n ^ m one must change the size of the identity matrix to match the 
size of A. When the context is clear, we will omit the index. 


Matrix inverses 


The inverse A~ l of a matrix A plays the same role in matrix multiplication as 
the inverse 1/a does for the number a. We will see in Section 2.3 that we can 
use the inverse of a matrix to solve systems of linear equations. 
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The only number that does not have an inverse is 0, but many matrices do 
not have inverses. In addition, the non-commutativity of matrix multiplication 
makes the definition more complicated. 

Definition 1.2.11 (Left and right inverses of matrices). Let A be a 
matrix. If there is another matrix B such that 

BA — /, 

then B is called a left inverse of A. If there is another matrix C such that 

AC = /, 

then C is called a right inverse of A. 


We will see in Section 2.3 that 
only square matrices can have a 
two-sided inverse, i.e., an inverse. 
Furthermore, if a square matrix 
has a left inverse then that left in- 
verse is necessarily also a right in- 
verse; similarly, if it has a right in- 
verse, that right inverse is neces- 
sarily a left inverse. 


It is possible for a nonzero matrix to have neither a right nor a left inverse. 

Example 1.2.12 (A matrix with neither right nor left inverse). The 
matrix ^ ^ does not have a right or a left inverse. To see this, assume it 

has a right inverse. Then there exists a matrix ° ^ such that 

c d 




1.2.13 


It is possible for a noil-square a b 

matrix to have lots of left inverses But t ' iat P r °duct is ^ ^ , i.e., in the bottom right-hand comer, 0 = 1. A 

and no right inverse or iots of similar computation shows that there is no left inverse. A 

right inverses and no left inverse, 

as explored in Exercise 1.2.20. 

Definition 1.2.13 (Invertible matrix). An invertible matrix is a matrix 
that has both a left inverse and a right inverse. 


While we can write the inverse 
of a number x either as x' 1 or as 
1/s, giving xx _1 = x(l/x) - 1, 
the inverse of a matrix A is only 
written A -1 . We cannot divide 
by a matrix. If for two matrices 
A and B you were to write A/B, 
it would be unclear whether this 
meant 

B~ l A or AB~\ 


Associativity of matrix multiplication gives us the following result: 

Proposition and Definition 1.2.14. If a matrix A has both a left and a 
right inverse, then it has only one left inverse and one right inverse , and they 
are identical; such a matrix is called the inverse of A and is denoted A ~ 1 . 

Proof. If a matrix A has a right inverse B , then AB = /. If it has a left 
inverse C, then CA = /. So 

C(AB) = Cl = C and (CA)B = IB = B, so C = B. □ 1.2.14 


We discuss how to find inverses of matrices in Section 2.3. A formula exists 
for 2 x 2 matrices: the inverse of 



is A 1 


1 d 
ad — be —c 



1.2.15 
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We are indebted to Robert Ter- 
rell for the mnemonic, “socks on, 
shoes on; shoes off, socks off.” To 
undo a process, you undo first the 
last thing you did. 


1 

If v = 0 , then its transpose 

lj 

is 

v' =[10 1], 


Do not confuse a matrix with 
its transpose, and in particular, 
never write a vector horizontally. 
If you write a vector written hor- 
izontally you have actually writ- 
ten its transpose; confusion be- 
tween a vector (or matrix) and its 
transpose leads to endless difficul- 
ties with the order in which things 
should be multiplied, as you can 
see from Theorem 1.2.17. 


as Exercise 1.2.12 asks you to confirm by matrix multiplication of AA~ l and 
A~ x A. (Exercise 1.4.12 discusses the formula for the inverse of a 3 x 3 matrix.) 

Notice that a 2 x 2 matrix is invertible if ad — 6c ^ 0. The converse is also 
true: if ad - be — 0, the matrix is not invertible, as you arc asked to show in 
Exercise 1.2.13. 

Associativity of matrix multiplication is also used to prove that the inverse 
of the product of two invertible matrices is the product of their inverses, in 
reverse order: 

Proposition 1.2.15 (The inverse of the product of matrices). If A 
and B are invertible matrices, then AB is invertible, and the inverse is given 
by the formula 

(AB)- 1 = B~ l A~ l . 1.2.16 

Proof. The computation 

(AB)(B~ l A- 1 ) = A(BB~ 1 )A~ 1 = AA~ X = I 1.2.17 

and a similar one for (B~ { A~ 1 )(AB) prove the result. □ 

Where was associativity used in the proof? Check your answer below. 4 


The transpose 

The transpose is an operation on matrices that will be useful when we come to 
the dot product, and in many other places. 

Definition 1.2.16 (Transpose). The transpose A T of a matrix A is formed 
by interchanging all the rows and columns of A, reading the rows from left 
to right, and columns from top to bottom. 


For example, if A = 


1 4 -2' 


1 

3' 

3 0 2 

, then A J ~ 

4 

0 

• „ 


-2 

2 


The transpose of a single row of a matrix is a vector; we will use this in 
Section 1.4. 


‘Associativity is used for the first two equalities below: 

d (g/) (DE) f 

(AB) (B~ l A~') = < A, fB''(B-'A- i )) = A({BB^)A-^ = 


A(IA~') = I. 


( AB ) 


( BC ) 
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Theorem 1.2.17 (The transpose of a product). The transpose of a 

The proof of Theorem 1.2.17 product is the product of the transposes in reverse order: 

is straightforward and is left “ (AB) t = B t A t . 1.2.18 

Exercise 1.2.14. 


1 1 O' 

1 0 3 

0 3 0 

m 

A symmetric matrix 

0 1 2 
-1 0 3 

-2 -3 0 

An anti-symmetric matrix 


Some special kinds of matrices 


Definition 1.2.18 (Symmetric matrix). A symmetric matrix is equal to 
its transpose. An anti-symmetric matrix is equal to minus its transpose. 

Definition 1.2.19 (TViangular matrix). An upper triangular matrix is a 
square matrix with nonzero entries only on or above the main diagonal. A 
lower triangular matrix is a square matrix with nonzero entries only on or 
below the main diagonal. 


110 3' 

0 2 0 0 
0 0 10 
0 0 0 0 

An upper triangular matrix 


Definition 1.2.20 (Diagonal matrix). A diagonal matrix is a square 
matrix with nonzero entries (if any) only on the main diagonal. 


What happens if you square the diagonal matrix 



0 

a 


? If you cube it? 5 


2 0 0 0 
0 2 0 0 
0 0 10 
.0 0 0 1 . 

A diagonal matrix 

Exercise 1.2.10 asks you to show 
that if A and B are upper trian- 
gular n x n matrices, then so is 
AB. 


Applications of matrix multiplication: probabilities and graphs 

While from the perspective of this book matrices are most important because 
they represent linear transformations, discussed in the next section, there are 
other important applications of matrix multiplication. Two good examples are 
probability theory and graph theory. 

Example 1.2.21 (Matrices and probabilities). Suppose you have three 
reference books on a shelf: a thesaurus, a French dictionary, and an English 
dictionary. Each time you consult one of these books, you put it back on the 
shelf at the far left. When you need a reference, we denote the probability that 
it will be the thesaurus P lf the French dictionary P 2 and the English dictionary 
P3. There are six possible arrangements on the shelf: 123 (thesaurus, French 
dictionary, English dictionary), 132, and so on. 


Jo 0] 2 fa s Ol.fo 0] 3 _ra 3 O' 
.0 aj lo a J J’lo oj ~ l 0 a 3 . 



For example, the move from 
(2 1 3) to (32 1) has probability Pz 
(associated with the English dic- 
tionary), since if you start with 
the order (2 1 3) (French dictio- 
nary, thesaurus, English dictio- 
nary), consult the English dictio- 
nary, and put il back to the far 
left, you will then have the order 
(321)- So the entry at the 3rd 
row, 6th column is P3. The move 
from (213) to (312) has proba- 
bility 0, since moving the English 
dictionary won’t change the posi- 
tion of the other books. So the 
entry at the 3rd row, 5th column 
is 0. 


A situation like this one, where 
each outcome depends only on the 
one just before it, it called a 
Markov chain. 


Sometimes easy access isn’t 
the goal. In Zola’s novel Au Bon- 
heur des Dame.s. the epic story of 
the growth of the first big depart- 
ment store in Paris, the hero has 
an inspiration: he places his mer- 
chandise in the most inconvenient 
arrangement possible, forcing his 
customers to pass through parts 
of the store where they otherwise 
wouldn’t set foot, and which are 
mined with temptations for im- 
pulse shopping. 
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We can then write the following 6x6 transition matrix, indicating the prob- 
ability of going from one arrangement to another: 



(1,2,3) 

(1,3,2) 

(2,1,3) 

(2,3,1) 

(3,1,2) 

(3,2,1) 

(1.2.3) 

Pi 

0 

P 2 

0 

P 3 

0 

(1.3.2) 

O 

Pi 

P 2 

0 

P3 

0 

(2,1.3) 

Pi 

0 

P 2 

0 

0 

P 3 

(2,3,1) 

Pi 

0 

0 

P 2 

0 

P 3 

(3,1,2) 

0 

Pi 

0 

P 2 

P 3 

0 

(3,2,1) 

0 

Pi 

0 

P 2 

0 

P 3 


Now say you start with the fourth arrangement, (2,3,1). Multiplying the line 
matrix (0, 0, 0, 1, 0, 0) (probability 1 for the fourth choice, 0 for the others) by the 
transition matrix T gives the probabilities Pi,0, 0, P 2 ,0, P 3 . This is of course 
just the 4th row of the matrix. The interesting point here is to explore the 
long-term probabilities. At the second step, we would multiply the line matrix 
P,,0,0, P 2 ,0, P3 by T\ at the third we would multiply that product by T, ... . 
If we know actual values for P u P 2 , and P 3 we can compute the probabilities 
for the various configurations after a great many iterations. If we don’t know 
the probabilities, we can use this system to deduce them from the configuration 
of the bookshelf after different numbers of iterations. 

This kind of approach is useful in determining efficient storage. How should a 
lumber yard store different sizes and types of woods, so as little time as possible 
is lost digging out a particular plank from under others? For computers, what 
applications should be easier to access than others? Based on the way you use 
your computer, how should its operating system store data most efficiently? A 

Example 1.2.22 is important for many applications. It introduces no new 
theory and can be skipped if time is at a premium, but it provides an enter- 
taining setting for practice at matrix multiplication, while showing some of its 
power. 

Example 1.2.22 (Matrices and graphs). We are going to take walks on 
the edges of a unit cube; if in going from a vertex V* to another vertex V* we 
walk along n edges, we will say that our walk is of length n. For example, in 
Figure 1.2.6, if we go from vertex V\ to passing by V4 and V5, the total 
length of our walk is 3. We will stipulate that each segment of the walk has to 
take us from one vertex to a different vertex; the shortest possible walk from a 
vertex to itself is of length 2. 

How many walks of length n are there that go from a vertex to itself, or, more 
generally, from a given vertex to a second vertex? As we will see in Proposition 
1.2.23, we answer that question by raising to the nth power the adjacency 
matrix of the graph. The adjacency matrix for our cube is the 8x8 matrix 
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whose rows and columns are labeled by the vertices Vi,. . . , Vs, and such that 
the i, jth entry is 1 if there is an edge joining Vi to Vj, and 0 if not, as shown in 
Figure 1.2.6. For example, the entry 4, 1 is 1 (underlined in the matrix) because 
there is an edge joining V 4 to Vj ; the entry 4, 6 is 0 (also underlined) because 
there is no edge joining V 4 to Ve- 

V 6 V 7 V 8 

1 0 0 

0 1 0 

0 0 1 

0 0 0 

1 0 1 

0 1 0 

1 0 1 

0 1 0 

FIGURE 1.2.6. Left: The graph of a cube. Right: Its adjacency matrix A. If two 
vertices K and Vj are joined by a single edge, the (i,j) th and (jt,i)th entries of the 
matrix are 1; otherwise they are 0. 

The reason this matrix is important is the following. 



A — 


Vi 0 

V 2 1 

v 3 0 

V 4 1 
V 5 0 

v 6 1 
v 7 0 

V 8 0 


v 2 v 3 


1 

0 

1 

0 

0 

0 

1 

0 


0 

1 

0 

1 

0 

0 

0 

1 


V 4 

1 

0 

1 

0 

1 

0 

0 

0 


Vs 

0 

0 

0 

1 

0 

1 

0 

1 


You may appreciate this result 
more if you try to make a rough 
estimate of the number of walks 
of length 4 from a vertex to itself. 
The authors did and discovered 
later that they had missed quite 
a few possible walks. 


Proposition 1.2.23. Fbr any graph formed of vertices connected by edges, 
the number of possible walks of length n from vertex Vi to vertex Vj is given 
by the ijth entry of the matrix A n formed by taking the nth power of the 
graph's adjacency matrix A. 

For example, there are 20 different walks of length 4 from V5 to V 7 (or vice 
versa), but no walks of length 4 from V 4 to V 3 because 


As you would expect, all the l’s 

*21 

0 

20 

0 

20 

0 

20 

0 * 

in the adjacency matrix A have 

0 

21 

0 

20 

0 

20 

0 

20 

turned into 0’s in A 4 ; if two ver- 

20 

0 

21 

0 

20 

0 

20 

0 

tices are connected by a single y|4 __ 

0 

20 

0 

21 

0 

20 

0 

20 

edge, then when n is even there 

20 

0 

20 

0 

21 

0 

20 

0 

will be no walks of length n be- 

0 

20 

0 

20 

0 

21 

0 

20 

tween them. 

20 

0 

20 

0 

20 

0 

21 

0 

Of course we used a computer 

. 0 

20 

0 

20 

0 

20 

0 

21. 


to compute this matrix. For all 

but simple problems involving ma- P ro °f* This will be proved by induction, in the context of the graph above; 
trix multiplication, use Matlab or the general case is the same. Let B n be the 8 x 8 matrix whose ijth entry is 
an equivalent. the number of walks from V to Vj of length n , for a graph with eight vertices; 
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we must prove B n = A n . First notice that B\ = A 1 = A: the number Aij is 

exactly the number of walks of length 1 from Vi to Vj. 

Next, suppose it is true for n, and let us see it for n + 1. A walk of length 
n+ 1 from 14 to V 3 must be at some vertex V* at time n. The number of such 
walks is the sum. over all such Vjt, of the number of ways of getting from V t to 
Vk in n steps, times the number of ways of getting from 14 to Vj in one step. 
This will be 1 if V k is next to Vj, and 0 otherwise. In symbols, this becomes 



No. of ways 
i to j in n + l steps 


S 



for all 
vertices k 


(£nk k 

No. ways t to 
k in n steps 



No. ways k to 
j in 1 step 


1 inductive def. Pf*4 
hypothesis of A ' 



1.2.19 


Like the transition matrices of 
probability theory, matrices repre- 
senting the length of walks from 
one vertex of a graph to another 
have important applications for 
computers and multiprocessing. 


which is precisely the definition of A n+1 . □ 

Above, what do we mean by A n ? If you look at the proof, you will see that 
what we used was 

A" = ((. . . (A)A)A\A. 1.2.20 

n factors 


Matrix multiplication is associative, so you can also put the parentheses any 
way you want; for example, 


A" = (a(A(A)...)). 


1 . 2.21 


In this case, we can see that it is true, and simultaneously make the associativity 
less abstract: with the definition above, B n B m — B n+m . Indeed, a walk of 


Exercise 1.2.15 asks you to con- 
struct the adjacency matrix for a 
triangle and for a square. We 
can also make a matrix that al- 
lows for one-way streets (one-way 
edges), as Exercise 1.2.18 asks you 
to show. „ 


length n Am from Vi to Vj is a walk of length n from Vi to some 14 , followed 
by a walk of length m from 14 to Vj. In formulas, this gives 

s 

( Bn+m)i,j = ^ A ^n)i,k{^m)k,j' 1.2.22 

k= 1 


3 What the Actors Do: A Matrix 
as a Transformation 


In Section 2.2 we will see how matrices are used to solve systems of linear 
equations, but first let us consider a different view of matrices. In that view, 
multiplication of a matrix by a vector is seen as a linear transformation , a 
special kind of mapping. This is the central notion of linear algebra, which 
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The words mapping (or map) 
and function are synonyms, gen- 
erally used in different contexts. 
A function normally takes a point 
and gives a number. Mapping is a 
more recent word; it was first used 
in topology and geometry and has 
spread to all parts of mathemat- 
ics. In higher dimensions, we tend 
to use the word mapping rather 
than function, but there is noth- 
ing wrong with calling a mapping 
from R 5 — * P . 5 a function. 



Figure 1.3.1. 

A mapping: every point on the 
left goes to only one point on the 



Not a mapping: not well de- 
fined at a , not defined at b. 


The domain of our mathemati- 
cal “final grade function” is R n ; its 
range is R. In practice this func- 
tion has a “socially acceptable” 
domain of the realistic grade vec- 
tors (no negative numbers, for ex- 
ample) and also a “socially accept- 
able" range, the set of possible fi- 
nal grades. Often a mathemati- 
cal function modeling a real sys- 
tem has domain and range consid- 
erably larger than the realistic val- 
ues. 


allows us to put matrices in context and to see them as something other than 
“pushing numbers around.” 


Mappings 

A mapping associates elements of one set to elements of another. In common 
speech, we deal with mappings all the time. Like the character in Moliere’s play 
Le Bourgeois Gentilhomme , who discovered that he had been speaking prose 
all his life without knowing it, we use mappings from early childhood, typically 
with the word “of ' or its equivalent: “the price of a book” goes from books to 
money; “the capital of a country” goes from countries to cities. 

This is not an analogy intended to ease you into the subject. “The father of” 
is a mapping, not “sort of like” a mapping. We could write it with symbols: 
f(x) = y where x — a person and y = that person’s father: /(John Jr.) = 
John. (Of course in English it would be more natural to say, “John Jr.’s father” 
rather than “the father of John Jr.” A school of algebraists exists that uses 
this notation: they write (x)f rather than f(x).) 

The difference between expressions like “the father of in everyday speech 
and mathematical mappings is that in mathematics one must be explicit about 
things that are taken for granted in speech. 

Rigorous mathematical terminology requires specifying three things about a 
mapping: 

(1) the set of departure (the domain ), 

(2) the set of arrival (the range), 

(3) a rule going from one to the other. 

If the domain of a mapping M is the real numbers R and its range is the 
rational numbers Q, we denote it M : R — ♦ Q, which we read “M from R to 
<Q.” Such a mapping takes a real number as input and gives a rational number 
as output. 

What about a mapping T : R n — ♦ R m ? Its input is a vector with n entries; 
its output is a vector with m entries: for example, the mapping from R n to R 
that takes n grades on homework, tests, and the final exam and gives you a 
final grade in a course. 

The rule for the “final grade” mapping above consists of giving weights to 
homework, tests, and the final exam. But the rule for a mapping does not 
have to be something that can be stated in a neat mathematical formula. For 
example, the mapping M : R — ► R that changes every digit 3 and turns it into 
a 5 is a valid mapping. When you invent a mapping you enjoy the rights of an 
absolute dictator; you don’t have to justify your mapping by saying that “look, 
if you square a number x , then multiply it by the cosine of subtract 7 and 
then raise the whole thing to the power 3/2, and finally do such-and-such, then 


Note that in correct mathemat- 
ical usage, “the father of" as a 
mapping from people to people is 
not the same mapping as “the fa- 
ther of” as a mapping from peo- 
ple to men. A mapping includes a 
domain, a range, and a rule going 
from the first to the second. 
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if x contains a 3, that 3 will turn into a 5, and everything else will remain 
unchanged.” There isn’t any such sequence of operations that will “carry out’ 

your mapping for you. and you don t need one. 6 

A mapping going “from" R n “to" is said to be defined on its domain 
A mapping in the mathematical sense must be well defined: it must be defined 
at every point of the domain, and for each, must return a unique element of 
the range. A mapping takes you, unambiguously, from one element of the set 
of departure to one element of the set of arrival, as shown in Figures 1.3.1 and 
1.3.2. (This does not mean that you can go unambiguously (or at all) in the 
reverse direction; in Figure 1.3.1, going backwards from the point d in the range 
will take you to either a or 6 in the domain, and there is no path from c in the 
range to any point in the domain.) 

Not all expressions “the this of the that” are true mappings in this sense. 
“The daughter of.” as a mapping from people to girls and women, is not ev- 
erywhere defined, because not everyone has a daughter; it. is not well defined 
because some people have more than one daughter. It is not a mapping. But 
“the number of daughters of,” as a mapping from women to numbers, is every- 
where defined and well defined, at a particular time. And “the father of,” as 
a mapping from people to men, is everywhere defined, and well defined; every 
person has a father, and only one. (We speak here of biological fathers.) 

Remark. We use the word “range” to mean the space of arrival, or “target 
space” ; some authors use it to mean those elements of the arrival space that are 
actually reached. In that usage, the range of the squaring function F : R -+ R 
given by F(x) = .r 2 is the non-negative real numbers, while in our usage the 
range is R. We will see in Section 2.5 that what these authors call the range, 
we call the image. As far as we know, those authors who use the word range to 
denote the image either have no word for the space of arrival, or use the word 
interchangeably to mean both space of arrival and image. We find it useful to 
have two distinct words to denote these two distinct objects. A 

6 Here’s another “pathological” but perfectly valid mapping: the mapping M : M — ► 
IP: that takes every number in the interval [0, lj that can be written in base 3 without 
using l’s, changes every 2 to a 1, and then considers the result as a number in base 2. 
If the number has a 1, it changes all the digits after the first 1 into 0’s and considers 
the result as a number in base 2. Cantor proposed this mapping to point out the need 
for greater precision in a number of theorems, in particular the fundamental theorem 
of calculus. At the time it was viewed as pathological but it turns out to be important 
for understanding Newton’s method for cubic polynomials in the complex. Mappings 
just like it occur everywhere in complex dynamics — a surprising discovery of the early 
1980’s. 
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Figure 1.3.3. 

An onto mapping, not 1-1, a 
and 6 go to the same point. 



Figure 1.3.4. 

A mapping: 1-1, not onto, no 
points go to a or to 6. 


“Onto” is a way to talk about 
the existence of solutions: a map- 
ping T is onto if there is a solution 
to the equation T(x) = 6, for every 
6 in the set of arrival (the range 
of T). “One to one” is a way to 
talk about the uniqueness of solu- 
tions: T is one to one if for every 
6 there is at most one solution to 
the equation T(x) = 6. 


Existence and uniqueness of solutions 

Given a mapping T % is there a solution to the equation T(x) = 6, for every 6 in 
the range (set of arrival)? If so, the mapping is said to be onto, or surjective . 
“Onto” is thus a way to talk about the existence of solutions. The mapping 
“the father of’ as a mapping from people to men is not onto, because not all 
men are fathers. There is no solution to the equation “The father of x is Mr. 
Childless.” An onto mapping is shown in Figure 1.3.3. 

A second question of interest concerns uniqueness of solutions. Is there at 
most one solution to the equation T{x) = 6, for every b in the set of arrival, or 
might there be many? If there is at most one solution to the equation T(x) — 6, 
the mapping T is said to be one to one , or injective. The mapping “the father 
oP is not one to one. There are, in fact, four solutions to the equation “The 
father of x is John Hubbard.” But the mapping “the twin sibling of,” as a 
mapping from twins to twins, is one to one: the equation “the twin sibling of x 
= 2 /” has a unique solution for each y. “One to one” is thus a way to talk about 
the uniqueness of solutions. A one to one mapping is shown in Figure 1.3.4. 

A mapping T that is both onto and one to one (also called bijective) has 
an inverse mapping T~ l that undoes it. Because T is onto, T~ l is everywhere 
defined; because T is one to one, T~ l is well defined. So T~ l qualifies as a 
mapping. To summarize: 

Definition 1.3.1 (Onto). A mapping is onto (or surjective) if every element 
of the set of arrival corresponds to at least one element of the set of departure. 

Definition 1.3.2 (One to one). A mapping is one to one (or injective) if 
every element of the set of arrival corresponds to at most one element of the 
set of departure. 

Definition 1.3.3 (Byective). A mapping is bijective if it is both onto and 
one to one. A bijective mapping is invertible. 


Example 1.3.4 (One to one and onto). The mapping “the Social Security 
number of’ as a mapping from Americans to numbers is not onto because there 
exist numbers that aren’t Social Security numbers. But it is one to one: no two 
Americans have the same Social Security number. 

The mapping f(x) = x 2 from real numbers to real positive numbers is onto 
because every real positive number has a real square root, but it is not one 
to one because every real positive number has both a positive and a negative 
square root. A 



A composition is written from 
left to right but computed from 
right to left: you apply the map- 
ping g to the argument x and 
then apply the mapping / to the 
result. Exercise 1.3.12 provides 
some practice. 


When computers do composi- 
tions it is not quite true that com- 
position is associative. One way of 
doing the calculation may be more 
computationally effective than an- 
other; because of round-off errors, 
the computer may even come up 
with different answers, depend- 
ing on where the parentheses are 
placed. 


Although composition is asso- 
ciative, in many settings, 

((/ 05 ) 0 / 1 ) and (/ o (9 o /i)) 

correspond to different ways of 
thinking. Already, the “father 
of the maternal grandfather” and 
“the paternal grandfather of the 
mother” are two ways of thinking 
of the same person; the author of a 
biography might use the first term 
when focusing on the relationship 
between the subject’s grandfather 
and that grandfather’s father, and 
use the other when focusing on the 
relationship between the subject’s 
mother and her grandfather. 
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Composition of mappings 

Often one wishes to apply, consecutively, more than one mapping. This is 
known as composition. 

Definition 1.3.5 (Composition). The composition / o g of two mapping, 
/ and 9 , is 

(f°9)( x ) - f(9 ( *))• 1 - 3 ’ 1 


Example 1.3.6 (Composition of “the father oP and “the mother of”). 
Consider the following two mappings from the set of persons to the set of persons 
(alive or dead): F, “the father of,” and M, “the mother of.” Composing these 
gives: 

F o M (the father of the mother of = maternal grandfather of) 

M oF (the mother of the father of = paternal grandmother of). 

It is clear in this case that composition is associative: 

Fo(FoM) = (FoF)oM. 1.3.2 

The father of David’s maternal grandfather is the same person as the paternal 
grandfather of David’s mother. Of course it is not commutative: the “father of 
the mother” is not the “mother of the father.”) A 

Example 1.3.7 (Composition of two functions). If f(x) = x - 1, and 

g(x) = x 2 , then 

(f°9)( x ) = f( 9 ( x )) = - 1- A 1.3.3 


Proposition 1.3.8 (Composition is associative). Composition is asso- 
ciative: 

fogoh = (fog)oh = fo(goh). 1.3.4 

Proof. This is simply the computation 

((fog) oh) (x) = (fog) (h(x)) = f(g(h(x))) whereas 
(fo(goh))(x) =f((goh)(x)) = f(g(h(x))). □ 1.3.5 

You may find this “proof” devoid of content. Composition of mappings is 
part of our basic thought processes: you use a composition any time you speak 
of “the this of the that of the other.” 
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Matrices and transformations 


The words transformation and 
mapping are synonyms, so we 
could call the matrix A of Figure 
1.3.5 a mapping. But in linear al- 
gebra the word transformation is 
more common. In fact, the matrix 
A is a linear transformation , but 
we haven't formally defined that 
term yet. 


Mathematicians usually denote 
a linear transformation by its as- 
sociated matrix; rather than say- 
ing that the “dinners to shopping 
list” transformation is the multi- 
plication Ab = c, they would call 
this transformation A. 


A special class of mappings consists of those mappings that are encoded by 
matrices. By “encoded” we mean that multiplication by a matrix is the rule 
that turns an input vector into an output vector: just as f(x) = y takes a 
number x and gives y, Av = w takes a vector v and gives a vector w. 

Such mappings, called linear transformations , are of central importance in 
linear algebra (and every place else in mathematics). Throughout mathemat- 
ics, the constructs of central interest are the mappings that preserve whatever 
structure is at hand. In linear algebra, “preserve structure” means that you can 
first add, then map, or first map, then add, and get the same answer; similarly, 
first multiplying by a scalar and then mapping gives the same result as first 
mapping and then multiplying by a scalar.) One of the great discoveries at the 
end of the 19th century was that the natural way to do mathematics is to look 
at sets with structure, such as R n , with addition and multiplication by scalars, 
and to consider the mappings that preserve that structure. 

We give a mathematical definition of linear transformations in Definition 
1.3.11, but first let’s see an example. 

Example 1.3.9 (Frozen dinners). In a food processing plant making three 
types of frozen dinners, one might associate the number of dinners of various 
sorts produced to the total ingredients needed (beef, chicken, noodles, cream, 
salt, . . . ). As shown in Figure 1.3.5, this mapping is given by multiplication (on 
the left) by the matrix A , which gives the amount of each ingredient needed for 
each dinner: A tells how to go from b, which tells how many dinners of each kind 
are produced, to the product c, which tells the total ingredients needed. For 
example, 21 pounds of beef are needed, because (.25 x 60) +(.20 x 30) +(0 x 40) = 
21. For chicken, (0 x 60) + (0 x 30) + (.45 x 40) = 18. 

b Dinners 
produced 

/ ■ s 

60 stroganoff 
30 ravioli 
40 fried chicken 


lbs. of beef — ► 
lbs. of chicken — > 
lbe. of noodles — ► 
lbs. of rice — ► 

.25 .20 0 

0 0 .45 

» • • ••• • • • 

• • • 


' 21 lb of beef ‘ 

18 lb of chicken 

• lb of noodles 

• • • lb of rice 

» 

liters of cream — ► 

• • • 

• • • 

• •• ■ • ■ • • • 

>eef stroganoff ravioli fried chicken 


• 

. -liters of cream. 


A Ingredients per dinner £ Total needed 

FIGURE 1.3.5. The matrix A is the transformation associating the number of dinners 
of various sorts produced to the total ingredients needed. A 



Notice that matrix multiplica- 
tion emphatically does not allow 
for feedback. For instance, it does 
not allow for the possibility that 
if you buy more you will get a 
discount for quantity, or that if 
you buy even more you might cre- 
ate scarcity and drive prices up. 
This is a key feature of linearity , 
and is the fundamental weakness 
of all models that linearize map- 
pings and interactions. 



x 

Figure 1.3.6. 

For any linear transformation 
T, 

T(ocx) = a T(x). 
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Example 1.3.10 (Frozen foods: composition). For the food plant of 
Example 1.3.9, one might make a matrix D, 1 high and n wide (n being the 
total number of ingredients), that would list the price of each ingredient, per 
pound or liter. The product DA would then tell the cost of each ingredient in 
each dinner, since A tells how much of each ingredient is in each dinner. The 
product (DA)b would give the total cost of the ingredients for all b dinners. 
We could also compose these transformations in a different order, first^figuring 
how much of each ingredient we need for all b dinners — the product Ah. Then, 
using D, we could figure the total cost: D(Ab)> Clearly, ( DA)b = D(Ab), 
although the two correspond to slightly different perspectives. A 


Real-life matrices 


We kept Example 1.3.9 simple, but you can easily see how this works in a more 
realistic situation. In real life — modeling the economy, designing buildings, 
modeling airflow over the wing of an airplane — vectors of input data contain 
tens of thousands of entries, or more, and the matrix giving the transformation 
has millions of entries. 

We hope you can begin to see that a matrix might be a very useful way of 
mapping from R n to R m . To go from 1R 3 , where vectors all have three entries, 


v = 


V\ 

V 2 

V3 


, to K 4 , where vectors have four entries, w = 
multiply v on the left by a 4 x 3 matrix: 


Wi 

W2 

W3 

lW 4 J 


, you would 


v\ 

V2 

. V K 
’ m ■ 
W2 
W3 
.W4 . 


1.3.6 


One can imagine doing the same thing when the n and m of IR n and are 
arbitrarily large. One can somewhat less easily imagine extending the same idea 
to infinite-dimensional spaces, but making sense of the notion of multiplication 
of infinite matrices gets into some deep water, beyond the scope of this book. 
Our matrices are finite: rectangular arrays, m high and n wide. 


Linearity 

The assumption that a transformation is linear is the main simplifying assump- 
tion that scientists and social scientists (especially economists) make to under- 
stand their models of the world. Roughly speaking, linearity means that if you 
double the input, you double the output; triple the input, triple the output . . . . 
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In Example 1.3.9, the transformation A is linear: each frozen beef stroganoff 
dinner will require the same amount of beef, whether one is making one dinner 
or 10,000. We treated the price function D in Example 1.3.10 as linear, but in 
real life it is cheaper per pound to buy 10,000 pounds of beef than one. Many, 
perhaps most, real-life problems are nonlinear. It is always easier to treat them 
as if they were linear; knowing when it is safe to do so is a central issue of 
applied mathematics. 


The Italian mathematician Sal- 
vatore Pincherle, one of the early 
pioneers of linear algebra, called 
a linear transformation a distribu- 
tive transformation (operazioni 
distributive), a name that is per- 
haps more suggestive of the formu- 
las than is “linear” 


Definition 1.3.11 (Linear transformation). A linear transformation 
T : R n — ♦ R m is a mapping such that for all scalars a and alHP, # € R n , 

T(if + w) = T(if)+T(w) and r(a?) = aT(?). 1-3.7 

The two formulas can be combined into one (where b is also a scalar): 

T(av + bw) = aT(v) + 6T(w). 1.3.8 


Example 1.3.12 (Linearity at the checkout counter). Suppose you need 
to buy three gallons of cider and six packages of doughnuts for a Halloween 
party. The transformation T is performed by the scanner at the checkout 
counter, reading the UPC code to determine the price. Equation 1.3.7 is noth- 
ing but the obvious statement that if you do your shopping all at once, it will 
cost you exactly the same amount as it will if you go through the checkout line 
nine times, once for each item: 

T(3gal.cider +6pkg. doughnuts) =3(T(lgal.cider))+6(T(lpkg. doughnuts)), 
unless the supermarket introduces nonlinearities such as “buy two, get one free.” 


Every linear transformation is 
given by a matrix. The matrix can 
be found by seeing how the trans- 
formation acts on the standard ba- 
sis vectors 


*■ «■ 
1 


' 0 ‘ 

0 

1 ! 

• 

• 

0 

0 

W m 


1^ 


Example 1.3.13 (A matrix gives a linear transformation). Let A be an 
mxn matrix. Then A defines a linear transformation T : R n — ♦ R m by matrix 
multiplication: 

T(v) — Av. 1.3.9 

Such mappings are indeed linear, because A(v+ w) = Av + Aw and A(cv) = 
cAtf, as you are asked to check in Exercise 1.3.14. A 

The crucial result of Theorem 1.3.14 below is that every linear transformation 
R n — ► R m is given by a matrix, which one can construct by seeing how the 
transformation acts on the standard basis vectors. This is rather remarkable. A 
priori the notion of a transformation from R n to R m is quite vague and abstract; 
one might not think that merely by imposing the condition of linearity one could 
say something so precise about this shapeless set of mappings as saying that 
each is given by a matrix. 
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To find the matrix for a linear 
transformation, ask: what is the 
result of applying that transforma- 
tion to the standard basis vectors? 
The ith column of the matrix for 
a linear transformation T is T(e,); 
to get the ith column of the ma- 
trix, just ask: what does the trans- 
formation do to e,? 


Theorem 1.3.14 (Linear transformations given by matrices). Every 
linear transformation T : R n -*• R m is given by multiplication by themxn 
matrix [T], the ith column of which is T($i). 

Putting the columns together, this gives T(V) = [TJV. This means that 
Example 1.3.13 is “the general” linear transformation in R n . 

Proof. Start with a linear transformation T : R n — ► R m , and manufacture the 
matrix [T] according to the rule given immediately above: the ith column of 
[T] is T(ej). We may write any vector v € R n in terms of the standard basis 
vectors: 




’Vl" 


T 


■0* 


‘O' 


v 2 


0 


1 


• 

V = 

• 

= Vi 

• 

• 

• 

+l>2 

0 

+ • • • + v n 

0 


-V n . 


. 0 . 


.0. 


.1. 


1.3.10 


Figure 1.3.7. 

The orthogonal projection of 


the 


We can write this more succinctly: 

point onto the x-axis v = Viei + V 2 S 2 H f v„e n , or, with sum notation, v = ^v<ej. 1.3.11 

is the point (j). “Projection" Then by linearity) 

n n 

T (y) = t 


means we draw a line from the 
point to the x-axis. “Orthogonal” 
means we make that line perpen- 
dicular to the x-axis. 


1.3.12 



»=1 »=1 

which is precisely the column vector [T]v. □ 

If this isn’t apparent, try translating it out of sum notation: 


T(9 ,) 
r -j 

Tfr) 
r 1 

T(3n) 

r i 

il +V2 

ii + " 

' +Vn ii 

1st col. 

m 

2nd col. 
of(T] 

nth col. 
of |T] 


1.3.13 


Vl 

V2 


v s 


n 




Every point on one face is re- 
flected to the corresponding point Example 1.3.15 (Finding the matrix of a linear transformation). What 
of the other. is the matrix for the transformation that takes any point in R 2 and gives its 
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orthogonal (perpendicular) projection on the £-axis, as illustrated in Figure 
1.3.7? You should assume this transformation is linear. Check your answer in 

the footnote below. 7 

What is the orthogonal projection on the line of equation x - y of the point 
(_1 ) ? Again, assume this is a linear transformation, and check below. s 


Example 1.3.16 (Reflection with respect to a line through the origin). 
Let us show that the transformation that reflects a point through a line through 
the origin is linear. This is the transformation that takes a point on one side 
of the line and moves it perpendicular to the line, crosses it, and continues the 


same distance away from the line, as shown in Figure 1.3.8. 

We will first assume that the transformation T is linear, and thus given by a 
matrix whose ith column is T(e t ). Again, all we have to do is figure out what 
the T does to ei and e 2 . We can then apply that transformation to any point 
we like, by multiplying it by the matrix. There’s no need to do an elaborate 
computation for each point. 

To obtain the first column of our matrix we thus consider where eh is mapped 
to. Suppose that our line makes an angle 0 (theta) with the x-axis, as shown 

" cos 26 * 


in Figure 1.3.9. Then ei is mapped to 


sin 20 


To get the second column, we 


7 The matrix is 


1 

Ol 


Y 


T 

0 

B 

, 0 


1 

B ■ 


0 

B • 


* q , which you will note is consistent with Figure 1.3.7, since 
. If you had trouble with this question, you are making life too 


transformation itself to construct its matrix. Just ask: what is the result of applying 
that transformation to the standard basis vectors? The ith column of the matrix for 
a linear transformation T is T(ei). So to get the first column of the matrix, ask, what, 
does the transformation do to ei? Since ei lies on the x-axis, it is projected onto 

V 


itself. The first column of the transformation matrix is thus ei = 


0 


standard basis vector, e 2 , lies on t 

0 


The second 


he y-axis and is projected onto the origin, so the 


0 


second column of the matrix is 

®The matrix for this linear transformation is 
line from 


1/2 1/2 
Ll /2 1/2 

to the line of equation x — y intersects that line at 


the perpendicular line from 


(- 0 - 


we multiply 


0 
1 

1/2 1/2 
[ 1/2 1/2 


, since the perpendicular 

( 1^2 ) » 88 < ^ oes 


To determine the orthogonal projection of the point 
Note that we have to consider the 


3 

-1 


point ^ -1 ^ as a vector in order to carry out the multiplication; we can’t multiply a 
matrix and a point. 
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see that e 2 is mapped to 


cos (2fl — 90°) 


sin 20 

sin (2d - 90°) 


- cos 20 


1.3.14 


So the “reflection” matrix is 


i«2 


Jcos 201 

cos 20 

sin 20 

{sin 20J 

sin 20 

- cos 20 


0 


For example, we can compute that the point with coordinates x 

" 2 cos 20 -I- sin 20 ", 
o A/, a/j 1 1 since 
2 sin 20 - cos 20 


1.3.15 
2 , 2 / = 1 


reflects to the point 


T(e 2 ) 


| sin 20 


cos 20 sin 20 2 _ 

2 cos 20 4- sin 20 

]-cos 20 


sin 20 —cos 2 0 1 ” 

2 sin 20 - cos 20 


1.3.16 


Figure 1.3.9. 
The reflection maps 


The transformation is indeed linear because given two vectors v and w, we 
have T(v + w) = T(v) + T(w), as shown in Figure 1.3.10. It is also apparent 
from the figure that T(cv) = cT(y). A 



to 


cos 2 0 
sin 20 



to 


sin 20 
— cos 20 


Example 1.3.17 (Rotation by an angle 0 ). The matrix giving the trans- 
formation R (“rotation by 0 around the origin”) is 

cos 0 - sin 0 

sin 0 cos 0 

The transformation is linear, as shown in Figure 1.3.11: rather than thinking 
of rotating just the vectors v and w, we can rotate the whole parallelogram 
P(v, w) that they span. Then i?(P(v,w)) is the parallelogram spanned by 
P(v), i?(w), and in particular the diagonal of R(P(v, w)) is R(v + w). A 


[K(e, ), fl(e2)] = 


Exercise 1.3.15 asks you to use 
composition of the transformation 
in Example 1.3.17 to derive the 
fundamental theorems of trigo- 
nometry. 


T<v*w) = 



FIGURE 1.3.10. Reflection is linear: the sum of the reflections is the reflection of 
the sum. 
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Exercise 1.3.13 asks you to find the matrix for the transformation 1R 3 -+ IR 3 
that rotates by 30° around the y- axis. 

Now we will see that composition corresponds to matrix multiplication. 


R(wfw) = 
R(v)+R(w) 

K 

•A \R(v) 


R(w)\ 


Theorem 1.3.18 (Composition corresponds to matrix multiplica- 
tion). Suppose S : R n — ► IR m and T : R m — » IR 1 are linear transformations 
given by the matrices [S] and [T] respectively. Then the matrix of the com- 
position To S equals the product [T][S] of the matrices of S and T: 


[ToS] = [Tj[S). 


1.3.17 


vVw 


/ 


\ . \ 


w/ 






Proof. This is a statement about matrix multiplication and cannot be proved 
without explicit reference to how matrices are multiplied. Our only references 
/ to the multiplication algorithm will be the following facts, both discussed in 
■g Section 1.1. 

(1) A$i is the fth column of A (as illustrated by Example 1.2.5); 

(2) the ith column of AB is Ab*, where b * is the ith column of B (as 
illustrated by Example 1.2.6). 


Figure 1.3.11. 
Rotation is linear: the sum of 
the rotations is the rotation of the 
sum. 


Many mathematicians would 
say that Theorem 1.3.18 justifies 
the definition of matrix multipli- 
cation. This may seem odd to the 
novice, who probably feels that 
composition of linear mappings is 
more baroque than matrix multi- 
plication. 


Now to prove the theorem; to make it unambiguous when we are applying 
a transformation to a variable and when we are multiplying matrices, we will 
write matrix multiplication with a star *. 

The composition (T o S ) is itself a linear transformation and thus can be 
given by a matrix, which we will call [T o 5), accounting for the first equality 
below. The definition of composition gives the second equality. Next we replace 
S by its matrix [$), and finally we replace T by its matrix: 

[T o S] * e ( = (T o S)(&i) = T(S(Si)) = T((S] * €,) = [T] * ([5] » e,). 1.3.18 

So the first term in this sequence, \T o 5] * e», which is the ith column of 
[To 5) by fact (1), is equal to 

[T] * the ith column of [5], 1.3.19 


which is the ith column of [T] * [5] by fact (2). 

Each column of \T o $] is equal to the corresponding column of [Tj * [5], so 
the two matrices are equal. □ 


Exercise 1.3.16 asks you to con- 
firm by matrix multiplication that 
reflecting a point across the line, 
and then back again, lands you 
back at the original point. 


We gave a computational proof of the associativity of matrix multiplication 
in Proposition 1.2.8; this associativity is also an immediate consequence of 
Theorem 1.3.18. 


Corollary 1.3.19. Matrix multiplication is associative: if A, B, C are matri- 
ces such that the matrix multiplication (AB) C is allowed , then so is A(BC), 
and they are equal. 


Proof. Composition of mappings is associative. □ 
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1.4 Geometry of w i 

... To acquire the feeling for calculus that is indispensable even in the 
most abstract speculations, one must have learned to distinguish that 
which is “big” from that which is “little,” that which is “ preponderant ” 
and that which is u negligible . ’—Jean Dieudonne, Calcul infinitesimal 

Whereas algebra is all about equalities, calculus is about inequalities: about 
things being arbitrarily small or large, about some terms being dominant or 
negligible compared to others. Rather than saying that things are exactly true, 
we need to be able to say that they are almost true, so they “become true in 
the limit.” 

For example, (5 + h) 3 — 125 -f 75/i + . . . , so if h = .01, we could use the 
approximation 

(5.01) 3 « 125 + (75 • .01) = 125.75. 1.4.1 

The issue then is to quantify the error. 

Such notions cannot be discussed in the language about 3£ n that has been 
developed so far: we need lengths of vectors to say that vectors are small, or 
that points are close to each other. We will also need lengths of matrices to say 
that linear transformations are “close” to each other. Having a notion of dis- 
tance between transformations will be crucial in proving that under appropriate 
circumstances Newton’s method converges to a solution (Section 2.7). 

In this section we introduce these notions. The formulas are all more or 
less immediate generalizations of the Pythagorean theorem and the cosine law, 
but they acquire a whole new meaning in higher dimensions (and more yet in 
infinitely many dimensions). 


The dot product 

The dot product in K n is the basic construct that gives rise to all the geometric 
notions of lengths and angles. 


Definition 1.4.1 (Dot product). The dot product x • y of two vectors 
x, ^ € R n is: 


The dot product is also known 
as the standard inner product. 





"VT 

II 

IX 

X 2 

0 

0 

0 

• 

V2 


.Xn. 


• Vn • 


£iVi + x$y2 H b x n y n . 


1.4.2 



(1 x 1) + (2 x 0) + (3 x 1) = 4. 


For example, 
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The dot product is obviously commutative: 

^ 

x y = y x, 


1.4.3 


and it is not much harder to check that it is distributive, i.e., that 


What we call the length of a 
vector is often called the Euclidean 
norm. 

Some texts use double lines to 
denote the length of a vector: ||\?1| 
rather than |v|. We reserve double 
lines to denote the norm of a mar 
trix, defined in Section 2.8. Please 
do not confuse the length of a vec- 
tor with the absolute value of a 
number. In one dimension, the 
two are the same; the “length” of 
the one-entry vector V = (-2] is 

Vi? = 2 . 


X- (?1 +y 2 ) = (x-yi) + (x-y 2 )i and 
(xi + x 2 ) • y = (xi • y) + (x 2 • y). 


The dot product of two vectors can be written as the matrix product of the 
transpose of one vector by the other: x • y = 52 T y = y T x. 


\y n 


' Xi “ 


" 2/1 ' 


X 2 

• 

V2 

is the same as 

- x n . 


- Vn - 

[zi x 2 ••• x„] 

. * v ' 




w 

transpose 


V2 

UnJ 

[xiyi + X 2 V 2 + • ■ • 4 - x n y n ] . 
> 1 * 1 — ' 

* T y 


1.4.5 


Conversely, the i, jth entry of the matrix product AB is the dot product of 
the jth column of B and the transpose of the tth row of A. For example, the 
entry 1,2 of AB below is 5, which is the dot product of the transpose of the 
first row of A and the second column of B: 

B 




transpose, 2nd col. 
1st row of A of B 


1.4.6 


Definition 1.4.2 (Length of a vector). The length |x| of a vector £ is 

|£| = y/St* 5 = yjx[ + x HiJ. 1.4.7 


What is the length |?| of = 


1 

1 

1 


?9 


Length and dot product: geometric interpretation in R 2 and R 3 

In the plane and in space, the length of a vector has a geometric interpretation: 
|x| is then the ordinary distance between 0 and x. As indicated by Figure 1.4.1, 


d Its length is |v| = \/l 2 + l 3 + l 2 = >/3. 
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this is exactly what the Pythagorean theorem says in the case of the plane; in 
space, this it is still true, since OAB is still a right triangle. 


Definition 1.4.2 is a version of 
the Pythagorean theorem: in two 

dimensions, the vector x = j 

is the hypotenuse of a right trian- 
gle of which the other two sides 
have lengths x\ and x^\ 

x 2 = Xf 2 + xt 2 ■ 



FIGURE 1.4.1. In the plane, the length of the vector with coordinates (o,6) is the 


ordinary distance between 0 and the point 


. In space, the length of the vector with 


coordinates (a, 6, c) is the ordinary distance between 0 and the point with coordinates 
(a, 6, c). 


The dot product also has an interpretation in ordinary geometry: 


Proposition 1.4.3 (Geometric interpretation of the dot product). 
If it, f are two vectors in R 2 or R 3 , then 

it = |£||^| cos a, 1.4.8 


where a is the angle between it and y . 



Figure 1.4.2. 

The cosine law gives 

|x-y| 2 = |x| 2 +|y| 2 -2|xj|y|cosa. 


Remark. Proposition 1.4.3 says that the dot product is independent of the 
coordinate system we use. You can rotate a pair of vectors in the plane, or in 
space, without changing the dot product, as long as you don’t change the angle 
between them. A 

Proof. This is an application of the cosine law from trigonometry, which says 
that if you know all of a triangle’s sides, or two sides and the angle between 
them, or two angles and a side, then you can determine the others. Let a 
triangle have sides of length a, 6, c, and let 7 be the angle opposite the side with 
length c. Then 

c 2 = a 2 + fc 2 — 2abcos7. Cosine Law 1.4.9 

Consider the triangle formed by the three vectors it, y and x - y , and let a be 
the angle between x and y, as shown in Figure 1.4.2. 
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If you don’t see how we got 
the numerator in Equation 1.4.12, 
note that the dot product of a 
standard basis vector e t and any 
vector v is the ith entry of v. For 
example, in R 3 , 


>? • Si = 


= 0 -f- V2 + 0 = t/2- 


tA 



Applying the cosine law, we find 

|x - y| 2 = |x| 2 + |y| 2 - 2|x||y| cos a. 1.4.10 

But we can also write (remembering that the dot product is distributive): 

|x - y| 2 = (x - y) • (X - y) = ((x - y) • x) - ((* - y) • y) 

= (x • X) - • x) - (« • y) + (y • y) 1.4.11 

= (x ■ x) + (?-?)- 2X •? = |x| 2 + |y | 2 - 2x • 


This leads to 


x • y = jx||y|cosa, 


(1.4.8) 


which is the formula we want. □ 


Example 1.4.4 (Finding an angle). What is the angle between the diagonal 
of a cube and any side? Let us assume our cube is the unit cube 0 < x, y, z < 1, 

n 

so that the standard basis vectors ej , £2, 63 are sides, and the vector d( = 1 

[l 

is a diagonal. The length of the diagonal is |d(| = \/3, so the required angle a 
satisfies 


d-e* 1 
cos a = -- ^ 

MM y/3 


1.4.12 


Figure 1.4.3. 

The projection of y onto the 
line spanned by x! is z. This gives 

x-y = |x||y|cosa 

= l*l|y|j|j = |*||2|. 


Thus a = arccos y/3/3 « 54.7°. A 

Corollary 1.4.5 restates Proposition 1.4.3 in terms of projections; it is illus- 
trated by Figure 1.4.3. 

Corollary 1.4.5 (The dot product in terms of projections). If it and 
9 are two vectors in R 2 or R 3 , then X f is the product of|2| and the signed 
length of the projection of f onto the line spanned by X. The signed length 
of the projection w positive if it points in the direction of if; it is negative if 
it points in the opposite direction. 


Defining angles between vectors in R n 

We want to use Equation 1.4.8 backwards, to provide a definition of angles in 

R n , where we can’t invoke elementary geometry when n > 3. Thus, we want to 
define 


v • w 


ot - arccos y -.,,-., , i.e., define a so that cos a = 

i v H w l l v !|w| 


2.4.13 
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Figure 1.4.4. 

Left to right: a positive dis- 
criminant gives two roots; a zero 
discriminant gives one root; a neg- 
ative discriminant gives no roots. 


But there’s a problem: how do we know that 


-1 < 


V — < i 

row - ’ 


1.4.14 


so that the arccosine exists? Schwarz’s inequality provides the answer. 10 It is 
an absolutely fundamental result regarding dot products. 


Theorem 1.4.6 (Schwarz’s Inequality). For any two vectors V and w, 

< |v| |w|. 1.4.15 

The two sides are equal if and only if v or # is a multiple of the other by a 
scalar. 

Proof. Consider the function Jtf + £w| 2 as a function of t. It is a second degree 
polynomial of the form at 2 + bt + c; in fact, 


|v -1- $w| 2 — |£w + vf 2 = |w| 2 * 2 + 2(v • w)t + |v| 2 . 1.4.16 

All its values are > 0, since it is the left-hand term squared; therefore, the 

graph of the polynomial must not cross the t- axis. But remember the quadratic 
formula you learned in high school: for an equation of the form at 2 + bt + c = 0, 

-nyp-lae 1.4.17 

2 a 

If the discriminant (the quantity b 2 -4 ac under the square root sign) is positive, 
the equation will have two distinct solutions, and its graph will cross the £-axis 
twice, as shown in the left- most graph in Figure 1.4.4. 

Substituting |wj 2 for a, 2v-w for b and |v| 2 for c, we see that the discriminant 
of Equation 1.4.16 is 

4(tf • w) 2 - 4|v| 2 |w| 2 . 1.4.18 

All the values of Equation 1.4.16 are > 0, so its discriminant can’t be positive: 

4(v * w) 2 - 4|v| 2 |w| 2 < 0, and therefore |v • w| < |v||w|, 
which is what we wanted to show. 

The second part of Schwarz’s inequality, that |v • w| = |v| |w| if and only 
if v or w is a multiple of the other by a scalar, has two directions. If w is a 
multiple of v, say w = tv, then 

|v • w| = |t||?| 2 = (|v|)(|t|K|) = |v||w|, 1.4.19 

10 A more abstract form of Schwarz’s inequality concerns inner products of vectors 
in possibly infinite-dimensional vector spaces, not just the standard dot product in 
R n . The general case is no more difficult to prove: the definition of an abstract inner 
product is precisely what is required to make this proof work. 
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The proof of Schwarz’s inequal- 
ity is clever; you can follow it 
line by line, like any proof which 
is written out in detail, but you 
won’t find it by simply following 
your nose! There is considerable 
contention for the credit: Cauchy 
and Bunyakovski are often consid- 
ered the inventors, particularly in 
Prance and in Russia. 


We see that the dot product of 
two vectors is positive if the angle 
between them is less than tt/2, and 
negative if it is bigger than tt/2. 


We prefer the word orthogonal 
to its synonym perpendicular for 
etymological reasons. Orthogonal 
comes from the Greek for “right 
angle,” while perpendicular comes 
from the Latin for “plumb line,” 
which suggests a vertical line. The 
word normal is also used, both 
as a noun and as an adjective, to 
express a right angle. 



Figure 1.4.5. 
The triangle inequality: 

I* + yl < 1*1 + |y|- 


so that Schwarz’s inequality is satisfied as an equality. 

Conversely, if |v • w| = |v||wj, then the discriminant in Equation 1.4.18 is 
zero, so the polynomial has a single root <o : 

+ <ow| 2 = 0, i.e., v = -tow 1.4.20 

and v is a multiple of w. □ 


Schwarz’s inequality allows us to define the angle between two vectors, since 
we are now assured that 


-1 < 


a • b 


< 1 , 


(1.4.14) 


Definition 1.4.7 (The angle between two vectors). The angle between 
two vectors v and w in R n is that angle a satisfying 0 < a < n such that 


cos a = 


(v • w) 
1*1 |w| ' 


1.4.21 


Corollary 1.4.8. Two vectors are orthogonal if their dot product is zero. 

Schwarz’s inequality also gives us the triangle inequality: when traveling 
from London to Paris, it is shorter to go across the English Channel than by 
way of Moscow. 

Theorem 1.4.9 (The triangle inequality). For any vectors 5 1 and y in 

R n , 

|x + y| < |*| + |y|. 1.4.22 

Proof. This inequality is proved by the following computation: 

|x+y| 2 = |x| 2 +2x-y+|>f < |*| 2 +2|x||y|+|y| 2 = (|x|+|y|) 2 , 1.4.23 

Schwarz 

so that |x + ^| < |x| + |y|. □ 

This is called the triangle inequality because it can be interpreted (in the 
case of strict inequality, not <) as the statement that the length of one side of 
a triangle is less than the sum of the lengths of the other two sides. If a triangle 
has vertices 0, x and *+$f, then the lengths of the sides are [x|, |*+?-x| = |y| 
and |x + y|, as shown in Figure 1.4.5. 



In some texts, |A| denotes the 
determinant of the matrix A. We 
use detA to denote the determi- 
nant. 

The length |A| is also called 
the Minkowski norm (pronounced 
MinKOVski). We find it simpler 
to call it the length, generalizing 
the notion of length of a vector. 
Indeed, the length of an n x 1 
matrix is identical to the length 
of the vector in K n with the same 
entries. 

You shouldn’t take the word 
“length” too literally; it’s just a 
name for one way to measure ma- 
trices. (A more sophisticated mea- 
sure, considerably harder to com- 
pute, is discussed in Section 2.8.) 


Thinking of an m x n matrix 
as a point in K nm , we can see 
that two matrices A and B (and 
therefore, the corresponding linear 
transformations) are close if the 
length of their difference is small; 
i.e., if | A — B\ is small. 
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Measuring matrices 

The dot product gives a way to measure the length of vectors. We will also 
need a way to measure the “length” of a matrix (not to be confused with either 
its height or its width). There is an obvious way to do this: consider an m x n 
matrix as a point in IR nm , and use the ordinary dot product. 


Definition 1.4.10 (The length of a matrix). If A is an n x m matrix, 
its length |A| is the square root of the sum of the squares of all its entries: 


m 




»=i j—i 


1.4.24 


For example, the length |A| of the matrix A = 


1 2 
0 1 
1 2 1 
1 0 3 


is %/6, since 1 + 4 + 
?n 


0 + 1=6. What is the length of the matrix B = 

If you find double sum notation confusing, Equation 1.4.24 can be rewritten 
as a single sum: 


\A\ 2 = ^2 a t.j : we sum a f ,j f° r i from 1 ton and j from 1 to m. 

t=l,...,n 


As in the case of the length of a vector, do not confuse the length |A| of a 
matrix with the absolute value of a number. (But the length of the 1 x 1 matrix 
consisting of the single entry [n] is indeed the absolute value of n.) 


Length and matrix multiplication 

We said earlier that the point of writing the entries of K mn as matrices is 
to allow matrix multiplication, yet it isn’t clear that this notion of length, in 
which a matrix is considered simply as a list of numbers, is in any way related 
to matrix multiplication. The following proposition says that it is. 

Proposition 1.4.11. (a) If A is an n x m matrix, and b is a vector in R m , 
then 

|Ab| < |i4||b|. 1.4.25 

(b) If A is an n x m matrix, and B is am x k matrix, then 

\AB\ < \A\ |B|. 


n |B| = 4, since 1 + 4 + 0 + 1 + 1 + 9 = 16. 


1.4.26 
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Proposition 1.4.11 will soon be- 
come an old friend; it is a very use- 
ful tool in a number of proofs. 

Of course, part (a) is the spe- 
cial case of part (b) (where k = 1), 
but the intuitive content is suffi- 
ciently different that we state the 
two parts separately. In any case, 
the proof of the second part fol- 
lows from the first. 


Proof. First note that if the matrix A consists of a single row, i.e., if A = a T 
is the transpose of a vector a, the assertion of the theorem is exactly Schwarz’s 
inequality: 

\Ab\ = |a • b| < |a| |b| = \A\\b \ . 1.4.27 

l* T b| |i T ||b| 

The idea of the proof is to consider that the rows of A are the transposes of 

vectors ai, . . . a n , as shown in Figure 1.4,6, and to apply the argument above 

to each row separately. Remember that since the tth row of A is aT, the ith 

-# -+ ^ ^ * 

entry (j4b)i of Ab is precisely the dot product a* • b. (This accounts for the 
equal sign marked (1) in Equation 1.4.28.) 


Remark 1.4.12. It follows from 
Proposition 1.4.11 that a linear 
transformation is continuous. Say- 
ing that a linear transformation A 
is continuous means that for every 
e and every x € K n , there exists 
a 6 such that if |x - >7| < 6 , then 
\Ax - Ay\ < €. By Proposition 
1.4.11, 

|i4x-i4yl~ M(x— y)|< |.«4||x— y|. 
So, set 



Then if we have |x - y| < 6 , 

15 ~ 9]< m 

and 


b 


sj 

b = (Ab) l 

sj 

q.2 • b = (^4b)2 

i 

■ 

• 

• 


c 

& 

il 

•i 


matrix A vector Ab 


FIGURE 1.4.6. Think of the rows of A as the transposes of the vectors ai,a 2 , . . . ,a n - 
Then the product ai b is the same as the dot product a, -b. Note that j4b is a vector, 
not a matrix. 


I Ax - Ay\ < ^ = c. 

Ml 

We have actually proved more: 
the 6 we found did not depend on 
x; this means that a linear trans- 
formation R n — ♦ R n is always uni- 
formly continuous. The definition 
of uniform continuity was given in 
Equation 0.2.6. A 


This leads to 

l^b| 2 = ^(/tb) 2 = £(i.b) ! . 1.4.28 

»=1 i = 1 

Now use Schwarz’s inequality (2); factor out |b| 2 (step 3), and consider (step 
4) the length squared of A to be the sum of the squares of the lengths of a*. 
(Of course, |a*| 2 = |a, T | 2 ). Thus, 

E(i-b ) 2 < ^liflbl 2 = (^I^fjibl 2 = M( 2 |b| 2 . 

(2) t=l (3) \ i=l ) (4) 

1.4.29 

This gives us the result we wanted: 




When solving big systems of 
linear questions was in any case 
out of tbe question, determinants 
were a reasonable approach to the 
theory of linear equations. With 
the advent of computers they lost 
importance, as systems of linear 
equations can be solved far more 
effectively with row reduction (to 
be discussed in Sections 2.1 and 
2.2). However, determinants have 
an interesting geometric interpre- 
tation; in Chapters 4, 5 and espe- 
cially 6, we use determinants con- 
stantly. 


Recall that the formula for the 
inverse^ of a 2 x 2 matrix A = 
a b 

is 


A~ l = 1 d ~ b 

ad - be — c a 

So a 2 x 2 matrix A is invertible if 
and only if det A ^ 0. 


In this section we limit our dis- 
cussion to determinants of 2 x 2 
and 3x3 matrices; we discuss de- 
terminants in higher dimensions in 
Section 4.8. 
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For the second, we decompose the matrix B into its columns and proceed as 
— • —* 

above. Let bi, . . . , b* be the columns of B. Then 
k k k 

\AB\ 2 = < £|4| 2 |b,f = = \A\ W, 1.4.30 

j = l j= 1 j= 1 

which proves the second part. □ 


Determinants in R 2 

The determinant is a function of square matrices: it takes a square matrix as 
input and gives a number as output. 


Definition 1.4.13 (Determinant m IK 2 ). 
, -* 

CL l 0 1 


matrix 


CL2 b 2 


IS 


The determinant of a 2 x 2 


det 




= CI1&2 “ <*2^1 • 


1.4.31 


The determinant is an interesting number to associate to a matrix because 
if we think of the determinant as a function of the vectors a and b in IR 2 , then 
it has a geometric interpretation, illustrated by Figure 1.4.7: 

Proposition 1.4.14 (Geometric interpretation of the determinant in 
® 2 )* ( & ) The area of the parallelogram spanned by the vectors 

*"[£] and s -[£] 

is |det[a,b]| 

(b) The determinant det[a, b] is positive if and only if b lies counterclock- 
wise from a; it is negative if and only if b lies clockwise from a. 


Proof, ( a) The area of the parallelogram is its height times its base. Its base 
is |b| = y/tff + 6§. Its height h is 


h = sin 0|a| = sin 0^/a? + a|. 
We can compute cos# by using Equation 1.4.8: 

&ibi -f c&2 


. a b 

COS 0 = =; — = 


a||b| yja\ -f ai\ ^b\ + b\ ' 


1.4.32 


1.4.33 
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b| a, 

Figure 1.4.7. 


The area of the parallelogram 

•a# 

spanned by a and b is | det[a, b]|. 

\c 



Figure 1.4.8. 


In the two cases at the top, the 
angle between b and c is less than 
7r/2, so det(a, b) > 0; this cor- 
responds to b being counterclock- 
wise from a. At the bottom, the 
angle between b and c is more 
than 7 t/ 2; in this case, b is clock- 
wise from a, and det(a, b) is neg- 
ative. 


So we get sin 6 as follows: 


sin 9 — y/T— cos 2 9 = 


(af + af)( 6 f + 6 §) - (0161 + ^ 2 ^ 2) 2 

(oj + oi)(6? + W) 


/af 6 f + af 6 | + <* 2^1 + - a? 6 f “ 20,610262 - a|6f 

(of + o|)(6? + 6f) 


1.4.34 


(0162 - 0261)* 

(of + a|)(6f + 6f) 

Using this value for sin 9 in the equation for the area of a parallelogram gives 

Area = |b| |a|sin0 
base height 

= \/6? + 6f y/of + 4\j ( Q f + £) = Ma-agfrl- 


base 


height 


determinant 


(b) The vector c obtained by rotating a counterclockwise by tt/2 is c = 


~a 2 

ai 


, and we see that <?• b = det[a, b]: 


-a 2 I f 6i 1 _ _ 

aj J [62J 


0261 + 01^2 — det 


ai 61 

02 f>2 


1.4.36 


Since (Proposition 1.4.3) the dot product of two vectors is positive if the angle 
between them is less than tt/ 2, the determinant is positive if the angle between 
b and c is less than 7 t/ 2. So b lies counterclockwise from a, as shown in Figure 
1.4.8. □ 

Exercise 1.4.6 gives a more geometric proof of Proposition 1.4.14. 

Determinants in R 3 

Definition 1.4.15 (Determinant in R 3 ). The determinant of a 3 x 3 
matrix is 


det 


01 61 Cl 

02 ^2 C2 

03 ^3 03 


* °l(^ 2 Cs “ 63C2) — 02(^1 *“ 63C1) + 03(6102 — 62C1). 
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Exercise 1.4.12 shows that a 
3x3 matrix is invertible if its de- 
terminant is not 0, 

For larger matrices, the formu- 
las rapidly get out of hand; we will 
see in Section 4.8 that such de- 
terminants can be computed much 
more reasonably by row (or col- 
umn) reduction. 

The determinant can also be 
computed using the entries of the 
first row, rather than of the first 
column, as coefficients. 


The cross product exists only in 
R 3 (and to some extent in M 7 ). 


Each entry of the first column of the original matrix serves as the coefficient 
for the determinant of a 2 x 2 matrix; the first and third (ai and 03) are positive, 
the middle one is negative. To remember which 2x2 matrix goes with which 
coefficient, cross out the row and column the coefficient is in; what is left is the 
matrix you want. To get the 2x2 matrix for the coefficient 02: 


a 

: &i 

Cl 



et 

r — hr- 

-62 

— 

61 Cl 

f 

a 

1 &3 

C3 


t>3 C 3 


1.4.37 


Example 1.4.16 (Determinant of a 3 x 3 matrix). 



'3 

1 

-2 







det 

1 

2 

2 

0 

4 

1 

— 3 det 

'2 4' 
° 1 

- ldet 

1 -2 
0 1 

-f- 2 det 

1 -2 

2 4 


= 3 (2-0) -(1+0) + 2 (4 + 4) = 21 A 


The cross product of two vectors 

Although the determinant is a number , as is the dot product, the cross product 
is a vector: 


Definition 1.4.17 (Cross product in The cross product a x b in 
IR 3 is 


V 


V 

02 

X 


. fl 3. 





O263 — 03&2 
”*0163 + O361 
01&2 02&1 


1.4.39 


Think of your vectors as a 3 x 2 matrix; first cover up the first row and take 
the determinant of what’s left. That gives the first entry of the cross product. 
Then cover up the second row and take minus the determinant of what’s left, 
giving the second entry of the cross product. The third entry is obtained by 
covering up the third row and taking the determinant of what’s left. 
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Example 1.4.18 (Cross product of two vectors in l 3 ). 


3 


~2 

0 

X 

1 

1 

_ m 


4 


w- 

0 l 1 

- 

det 

1 4 J 


— det 

'3 2 

1 4 


det 

'3 2' 
L° l . 




1.4.40 


Like the determinant, the cross 
product has a geometric interpre- 
tation. 

The right-hand rule: if you 
put the thumb of your right hand 
on a and your index finger on b, 
while bending your third finger, 
then your third finger points in the 
direction of a x b. (Alternatively, 
curl the fingers of your right hand 
from a to b; then your right thumb 
will point in the direction of axb.) 


Proposition 1.4.10 (Geometric interpretation of the cross product). 
The cross product a x b is the vector satisfying three properties: 

( 1 ) It is orthogonal to the plane spanned try a and b; i.e., 

a*(axb) = 0 and b-(ax b) = 0. 1.4.41 

(2) Its length |a x bj is the area of the parallelogram spanned by a and 

b, 

(3) The three vectors a, b and axb satisfy the right-hand rule. 

Proof. For the first part, it is not difficult to check that the cross product 
a x b is orthogonal to both a and b: we check that the dot product in each 
case is zero (Corollary 1.4.8). Thus a x b is orthogonal to a because 


Definition 1.4.17 of axb 



v 


O263 — O362 

a • (a x b) = 

0 2 

• 

—0163 4 - 0361 


- fl 3 . 


0162 — 0261 


1.4.42 


— 010263 ~ 010362 — 010263 4 * 020361 4 - 010362 — 020361 = 0 . 

For the second part, the area of the parallelogram spanned by a and b is 
|a{ • |b|sin0, where 0 is the angle between a and b. Wc know (Equation 1.4.8) 
that 


rr>s Q = a * _ — fl l6i 4- Q262 4- Q363 

|a||b| + a 3 Vtf + 62 + b l ’ 


1.4.43 


so we have 

sin 6 = y/l - cos 2 0 = 


1 - 


(fli6i 4- 0262 4- Q363) 2 
( o ? + <*2 + ° 3)(&1 + b 2 + b 3 ) 


_ , / (of 4 - q| 4- Q3)(6f 4- 6% 4- 63) — (oi6i 4- Q262 4- Q363) 2 

(o? + a \ + o?)(6? 4- 6§ 4- 63) 


1.4.44 
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so that 

|a||b|sinf? = \J(a\ 4- a\ + 4- b\ + 6§) - (ai&i + 0262 4* CI363) 2 . 1.4.45 


The last equality in Equation 
1.4.47 comes of course from Defi- 
nition 1.4.17. 

You may object that the middle 
term of the square root looks dif- 
ferent than the middle entry of the 
cross product as given in Defini- 
tion 1.4.17, but since we are squar- 
ing it, 

(—0163 + 0361 ) 2 = (0163 — (2361 ) 2 . 


Carrying out the multiplication results in a formula for the area that looks 
worse than it is: a long string of terms too big to fit on this page under one 
square root sign. That’s a good excuse for omitting it here. But if you do the 
computations you’ll see that after cancellations we have for the right-hand side: 


— 2fli6ifl2^2d"fli^3 d" ^3^1 ““ 201610363 -f- d" ®3^2 ~ 202620363, 

- ■ — s. v — v " .' — * 

(0163— aafti) 3 


(ai6j— ajbi) 2 (0163— <*361)* (<*263— 0362) 3 

which conveniently gives us 

Area = |a||b|sin# = \J (0162 - 0261 ) 2 -1- (0163 — 0361 ) 2 + (0263 — d^>2)^ 


1.4.46 


= |a x b|. 


So far, then, we have seen that the cross product a x b is orthogonal to a 

and b, and that its length is the area of the parallelogram spanned by a and b. 

What about the right-hand rule? Equation 1.4.39 for the cross product cannot 

actually specify that the three vectors obey the right-hand rule, because your 

right hand is not an object of mathematics. 

What we can show is that if one of your hands fits ©i, e 2 , e 3 , then it will also 

fit a, b, a x b. Suppose a and b are not collinear. You have one hand that fits 
^ ^ ^ 

a, b, a x b; i.e., you can put the thumb in the direction of a, your index finger in 
the direction of b and the middle finger in the direction of ax b without bending 
your knuckles backwards. You can move a to point in the same direction as 
ei, for instance, by rotating all of space (in particular b, a x b and your hand) 
around the line perpendicular to the plane containing a and ei . Now rotate all 
of space (in particular a x b and your hand) around the s-axis, until b is in the 
( x 1 y)- plane, with the ^-coordinate positive. These movements simply rotated 
your hand, so it still fits the vectors. 

Now we see that our vectors have become 


a = 

’a 1 

0 

and b = 

V 

62 

«■* ^ 

, so a x b = 

1 

0 0 

. 1 


0 


o_ 


062 J 


Thus, your thumb is in the direction of the positive x-axis, your index finger 
is horizontal, pointing into the part of the (jj, y)-plane where y is positive, and 
since both a and 6 2 are positive, your middle finger points straight up. So 
the same hand will fit the vectors as will fit the standard basis vectors: the 
right hand if you draw them the standard way (x-axis coming out of the paper 
straight at you, y-axis pointing to the right, 2-axis up.) A 
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Geometric interpretation of the determinant in M 3 


The determinant of three vectors a, b and c can also be thought of as the dot 
product of one vector with the cross product of the other two, a • (b x <?): 


a\ 

<*3 


det 

62 C 2 

6 3 C 3 


- det 

61 Cl 
&3 C 3 


det 

bi ci' 
£>2 C 2 



= a\ det 


62 c 2 

6 3 C 3 


- a 2 det 


i> 2 c 2 

&3 C3 


+ <13 det 


61 Ci 

6 2 C2 


1.4.49 


The word parallelepiped seems 
to have fallen into disuse; we’ve 
met students who got a 5 on the 
Calculus BC exam who don’t 
know what the term means. It is 
simply a possibly slanted box: a 
box with six faces, each of which 
is a parallelogram; opposite faces 
are equal. 


As such it has a geometric interpretation: 

Proposition 1.4.20. (a) The absolute value of the determinant of three 
vectors a, b, c forming a 3 x 3 matrix gives the volume of the parallelepiped 
they span . 

(b) The determinant is positive if the vectors satisfy the right-hand rule, 
and negative otherwise. 


The determinant is 0 if the 
three vectors axe co-plan ar. 



Figure 1.4.9. 

The determinant of a, b, c gives 

the volume of the parallelepiped 
spanned by those vectors. 


Proof, (a) The volume is height times the area of the base, the base shown 
in Figure 1.4.9 as the parallelogram spanned by b and c. That area is given 
by the length of the cross product, |b x c|. The height h is the projection of a 
onto a line orthogonal to the base. Let’s choose the line spanned by the cross 
product b x c — that is, the line in the same direction as that vector. Then 
h = |a| cos0, where 9 is the angle between a and b x c, and we have 


^ ^ ^ ^ ^ 

Volume of parallelepiped = |b x c| |a| cos# = | a • (b x c) |. 1.4.50 

base height determinant 

(b) The determinant is positive if cos 9 > 0 (i.e., if the angle between a and 
b x c is less than ir/ 2). Put your right hand to fit b x c, b,c; since b x c is 
perpendicular to the plane spanned by b and c, you can move your thumb in 
any direction by any angle less than tt/2, in particular, in the direction of a. 
(This requires a mathematically correct, very supple thumb.) A 

Remark. The correspondence between algebra and geometry is a constant 
theme of mathematics. Figure 1.4.10 summarizes the relationships discussed in 
this section. A 
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| Correspondence of Algebra and Geometry 

Operation 

Algebra 

Geometry 

dot product 

V • W = VjWi 

vw = ( v j j w | cos 0 

determinant of 
2 x 2 matrix 

det[ ai i " 1 
L 02 

1 1 

II 

a 

H-* 

5" 

1 

a 

M 

S' 

det Gl _ Area of parallelogram 

02 &2 

cross product 

r 01 ' 
02 

1.03 _ 

I" bi 

X J>2 
_&3 

— 

02^3 — b'2 a 3 
^1 0.3 “ 01^3 

dlfa ~ 02^1 _ 

(axb)ia, (axb)lb 

Length = area of parallelogram 
Right-hand rule 

determinant 
of 3 x 3 matrix 

det 

Q,\ b\ Cl 
02 ^2 c 2 
03 ^3 C 3 

to 

X 

t«J 

II 

|det[a, b,c]| = Volume of parallelepiped 


Figure 1.4.10. Mathematical “objects" often have two interpretations: algebraic 
and geometric. 


1.5 Convergence and Limits 


The inventors of calculus in the 
17th century did not have rigor- 
ous definitions of limits and con- 
tinuity; these were achieved only 
in the 1870s. Rigor is ultimately 
necessary in mathematics, but it 
does not always come first, as 
Archimedes acknowledged about 
his own work, in a manuscript 
discovered in 1906. In it Archi- 
medes reveals that his deepest re- 
sults were found using dubious in- 
finitary arguments, and only later 
proved rigorously, because “it is of 
course easier to supply the proof 
when we have previously acquired 
some knowledge of the questions 
by the method , than it is to find it 
without any previous knowledge." 
(We found this story in John Still- 
well’s Mathematics and Its His- 
tory.) 


In this section, we collect the relevant definitions of limits and continuity. 
Integrals, derivatives, series, approximations: calculus is all about convergence 
and limits. It could easily be argued that these notions are the hardest and 
deepest of all of mathematics. They give students a lot of trouble, and his- 
torically, mathematicians struggled to come up with correct definitions for two 
hundred years. Fortunately, these notions do not become more difficult in sev- 
eral variables than they are in one variable. 

More students have foundered on these definitions than on anything else in 
calculus: the combination of Greek letters, precise order of quantifiers, and 
inequalities is a hefty obstacle. Working through a few examples will help you 
understand what the definitions mean, but a proper appreciation can probably 
only come from use; we hope you have already started on this path in one- 
variable calculus. 


Open and closed sets 

In mathematics we often need to speak of an open set U ; whenever we want to 
approach points of a set U from every side. U must be open. 

Think of a set or subset as your property, surrounded by a fence. The set is 
open (Figure 1.5.1) if the entire fence belongs to your neighbor. As long as yon 
stay on your property, you can get closer and closer to the fence, but you can 
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© 


never reach it. No matter how close you are to your neighbor’s property, there 
is always an epsilon-thin buffer zone of your property between you and it — just 
as no matter how close a non zero point on the real number line is to 0, you 
can always find points that are closer. 

The set is dosed (Figure 1.5.2) if you own the fence. Now, if you sit on your 
fence, there is nothing between you and your neighbor’s property. If you move 
even an epsilon further, you will be trespassing. 

What if some of the fence belongs to you and some belongs to your neighbors? 
An open set includes none of Then the set is neither open nor closed, 
the fence; no matter how close a 

Remark 1.5.1. Even very good students often don’t see the point of specifying 
that a set is open. But it is absolutely essential, for example in computing 
derivatives. If a function / is defined on a set that is not open, and thus contains 
at least one point x that is part of the fence, then talking of the derivative of / 
at x is meaningless. To compute f'(x) we need to compute 


Figure 1.5.1. 


point in the open set. is to the 
fence, you can always surround it 
with a ball of other points in the 
open set. 



Figure 1.5.2. 

A closed set includes its fence. 


/'(*) = Mm £ (/(* + ft) - /(*)), 1.5.1 

but /(x + h) won’t necessarily exist for h arbitrarily small, since x + h may be 
outside the fence and thus not in the domain of /. This situation gets much 
worse in IR". 12 A 

In order to define open and closed sets in proper mathematical language, we 
first need to define an open ball. Imagine a balloon of radius r, centered around 
a point x. The open ball of radius r around x consists of all points y inside 
the balloon, but not the skin of the balloon itself: whatever y you choose, the 
distance between x and y is always less than the radius r. 


Note that jx - y| must be less 
than r for the ball to be open; it 
cannot be = r. 


Definition 1.5.2 (Open ball). For any x 6 ® n and any r > 0, the open 
ball of radius r around x is the subset 

^r(x) = {y € IR n such that |x - y| < r}. 1.5.2 


The symbol c used in Defini- We use a subscript to indicate the radius of a ball B\ the argument gives the 

tion 1.5.3 means “subset of.” If center of the ball: a ball of radius 2 centered at the point y would be written 
you are not familiar with the sym- ^(y). 

wtah ',Ti H ff theory \ you may A subset is open if every point in it is contained in an open ball that itself 
* h to mad the d.scuaa.on of set contained in the subset: 
theoretic notation in Section 0.3. 


Definition 1.5.3 (Open set of R n ). A subset U c R n is open in R n if for 
every point x e U, there exists r > 0 such that B r (x) c U. 


It is possible to make sense of the notion of derivatives in closed sets, but these 
results, due to the great American mathematician Hassler Whitney, are extremely 
difficult, well beyond the scope of this book. 



Note that parentheses denote 
an open set: (a, 6), while brackets 
denote a closed set: {a, 6). Some- 
times, especially in FYance, back- 
wards brackets are used to denote 
an open set: ]a, 6(= (a, 6). 


The use of the word domain in 
Example 1.5.6 is not really mathe- 
matically correct: a function is the 
triple of 

(1) a set X: the domain; 

(2) a set Y: the range; 

(3) a rule / that associates an 
element f(x) £ Y to each 
element x £ X. 

Strictly speaking, the formula 
l/(y — x 2 ) isn’t a function until 
we have specified the domain and 
the range, and nobody says that 
the domain must be the comple- 
ment of the parabola of equation 
y = x 2 : it could be any subset of 
this set. Mathematicians usually 
disregard this, and think of a for- 
mula as defining a function, whose 
domain is the natural domain of 
the formula, i.e., those arguments 
for which the formula is defined. 
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However close a point in the open subset U is to the “fence” of the set, by 
choosing r small enough, you can surround it with an open ball in IR n that is 

entirely in the open set, not touching the fence. 

A set that is not open is not necessarily closed: an open set owns none of its 

fence. A closed set owns all of its fence: 

Definition 1.5.4 (Closed set of R n ). A closed set of R n , C C M n , is a set 
whose complement R n - C is open. 


Example 1.5.5 (Open sets). 

(1) If a < 6, then the interval 

{a,b) = {xeR\a<x<b} 1.5.3 

is open. Indeed, if x £ (a, 6), set r = min{x — a, b — x}. Both these 
numbers are strictly positive, since a < r < 6, and so is their minimum. 
Then the ball {j/|j/-x<r}isa subset of (a, 6). 

(2) The infinite intervals (a,oo), (- 00 ,6) are also open, but the intervals 

(a, 6] = {x £ R | a < x < b] and [a, 6] = {x £ IR | a < x < b} 1.5.4 
are not. 

(3) The rectangle 

(a, b) x (c, d) = | ( ^ ) £ JR 2 | a < x < b , c < y < d j 1.5.5 

is also open. A 


Natural domains of functions 

We will often be interested in whether the domain of definition of a function — 
what we will call its natural domain — is open or closed, or neither. 

Example 1.5.6 (Checking whether the domain of a function is open 
or closed). The natural domain of the function 1 /(y - x 2 ) is the subset of K 2 
where the denominator is not 0, i.e., the natural domain is the complement of 
the parabola P of equation y = x 2 . This is more or less obviously an open set, 
as suggested by Figure 1.5.3. 

We can see it rigorously as follows. Suppose £ P, so that |6 - a 2 | = 
C > 0, for some constant C. Then if 

3'6w} = r ’ 1 ' 5 ' 6 


H, |v| < min < 1 
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Figure 1.5.3. 


It seems obvious that given a 
point off the parabola P, you can 
draw a disk around the point that 
avoids the parabola. Actually 
finding a formula for the radius of 
such a disk is more tedious than 
you might expect. 


we have 13 

|(6 + v) - (a + u) 2 \ = |6 - a 2 + v - 2au -u 2 \>C- (|v| + 2|a||u| + |«| 2 ) 

Therefore, ( J J “ ) is not on the Par at>ola - Tilis means that we 0811 ^ aw 

a square of side length 2 r around the point (g) and know that any point in 
that open square will not be on the parabola. (We used that since |u| < 1, we 
have |u| 2 < |ti|.) 

If we had defined an open set in terms of squares around points rather than 
balls around points, we would now be finished: we would have shown that the 
complement of the parabola P is open. But to be complete we now need to point 
out the obvious fact that there is an open ball that fits in that open square. 

We do this by saying that if “) - (“)| < r ( ie -> (b+v) “ “ the 

circle of radius r around (g)) then |u|,|v| < r (i.e., it is also in the square 

of side length 2r around ($))• Therefore the complement of the parabola is 
open. A 

This seems like a lot of work to prove something that was obvious to begin 
with. However, now we can actually compute the radius of an open disk around 

any point off the parabola. For the point (5Q, what is the radius of such a 

disk? Check your answer below. 14 The answer you get will not be sharp: there 
are points between that disk and the parabola. Exercise 1.5.6 asks you to find 
a sharper result; Exercise 1.5.7 asks you to find the exact result. A 

Example 1.5.7 (Natural domain). What is the natural domain of the 
function 



i.e., those arguments for which the formula is defined? If the argument of the 
square root is non-negative, the square root can be evaluated, so the first and 
the third quadrants are in the natural domain. The x-axis is not (since y = 0 
there), but the y-axis with the origin removed is in the natural domain, since 

13 How did we make up this proof? We fiddled, starting at the end and seeing what 
r should be in order for the computation to come out. Note that if a = 0, then C/(6|a| 
is infinite, but this does not affect the choice of r since we are choosing a minimum. 

14 For x = 2 ,y = 3 we have C = \y - x 2 j = |3 - 4| = 1, so r = min |l, J, ^ j = 
1/12. The open disk of radius 1/12 around f does not intersect the parabola. 
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Figure 1.5.4. 

The natural domain of the func- 
tion 

'(;)-£ 
is neither open nor closed. 


Infinite decimals are actually 
limits of convergent sequences. If 
ao = 3, a x — 3.1, aa = 3.14, 
. . . , a n = ir to n decimal places, 
how large does M have to be so 
that if n > M, then |a n — < 

10“ 3 ? The answer is M = 3: 7r — 
3.141 = .0005926.... The same 
argument holds for any real num- 
ber. 


x/y is zero there. So the natural domain is the region drawn in Figure 1.5.4. 

A 


Several similar examples are suggested in Exercise 1.5.8. 


Convergence 

Unless we state explicitly that a sequence is finite, sequences will be infinite. 
A sequence of points ai , &2 • • • converges to a if, by choosing a point far 
enough out in the sequence, you earn make the distance between all subsequent 
points in the sequence and a as small as you like: 

Definition 1.5.8 (Convergent sequence). A sequence of points ai,& 2 . . . 
in M n converges to a € R n if for all e > 0 there exists M such that when 
m > Af, then |a m - a| < e. We then call a the limit of the sequence. 

Exactly the same definition applies to a sequence of vectors: just replace a 
in Definition 1.5.8 by a, and substitute the word “vector” for “point.” 

Convergence in R n is just n separate convergences in R: 

Proposition 1.5.9. A sequence (a,„) = ai,a 2 ,... with a< € R n converges 
to a if and only if each coordinate converges; i.e., if for all j with 1 < j < n 
the coordinate (om)j converges to a.j, the jth coordinate of the limit a. 


The proof is a good setting for understanding how the e ~ M game is played 
(where M is the M of Definition 1.5.8). You should imagine that your opponent 
gives you an epsilon and challenges you to find an M that works, i.e., an M 
such that when m > M, then |(a m ) 7 — <ij\ < e. You get extra points for style 
for finding a small M , but it is not necessary in order to win the game. 


Proof. 


(<*m)l 


Let us first see the easy direction: the statement that am = 


L(*bn)nJ 

converges implies that for each j = l,...,n, the sequence of numbers (a m )j 
converges. The challenger hands you an epsilon. Fortunately you have a team- 
mate who knows how to play the game for the sequence a m , and you hand her 
the epsilon you just got. She promptly hands you back an M with the guaran- 
tee that when m > M , then |am — aj < e (since the sequence am is convergent). 
The length of the vector am - a is 


*m 


~ a l — y (( fl m)l - Ui) 4 h ((u m )„ - a n y 
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This is typical of all proofs 
involving convergence and limits: 
you are given an e and challenged 
to come up with a <5 (or M or 
whatever) such that a certain 
quantity is less than 

Your “challenger” can give you 
any e > 0 he likes; statements 
concerning limits and continuity 
are of the form “for all epsilon, 
there exists ... .” 


so you give that M to your challenger, with the argument that 

l( a m)j/ — a j I — | a m — < €• 1.5.9 

He promptly concedes defeat. 

Now let us try the opposite direction: the convergence of the coordinate se- 
quences ( a m )j implies the convergence of the sequence a m . Again the challenger 
hands you an e > 0. This time you have n teammates, each of whom knows how 
the play the game for a single convergent coordinate sequence ( a m )j • After a 
bit of thought and scribbling on a piece of paper, you pass along e/y/n to each 
of them. They dutifully return to you cards containing numbers Mi ...M„, 
with the guarantee that 



when m> Mj. 


1.5.10 


You sort through the cards and choose the one with the largest number, 


M = max{Mj . . . M n }, 


1.5.11 


which you pass on to the challenger with the following message: 

if m > M, then m > Mj for each j - 1 • • • = n, so Kam), - aj\ < e/y/n , so 


l®m “ a J ~ V^((°"»)l ~ Ol) + ' • + ((flm)n ~ On) 2 

te) + " + fe) = V? =f - D 


1.5.12 


The scribbling you did was to figure out that handing e/y/n to your team- 
mates would work. What if you can’t figure out how to “slice up” € so that the 
final answer will be precisely e? In that case, just work directly with € and see 
where it takes you. If you use c instead of e/y/n in Equations 1.5.10 and 1.5.12, 
you will end up with 


l a m - a| < eyfn. 1.5.13 

You can then see that to land on the exact answer, you should have chosen 
e/y/n. 

In fact, the answer in Equation 1.5.13 is good enough and you don’t really 
need to go back and fiddle. Intuitively, “less than epsilon” for any e > 0 and 
“less than some quantity that goes to 0 when epsilon goes to 0” achieve the 
same goal: showing that you can make some quantity arbitrarily small. The 

following theorem states this precisely; you are asked to prove it in Exercise 
1.5.12. 
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Theorem 1.5.10 (Elegance is not required). Let u(e), with e > 0, be 
a function such that u(e) — ► 0 as c — ► 0. Then the following two statements 
are equivalent: 

(1) For all e > 0, there exists a 6 > 0 such that when |x — Xo| < S, then 
l/(x) - /(Xo)l < u(e). 

(2) For all e > 0, there exists a 6 > 0 such that when |x - xo| < S, then 
|/(x) - /(xo)| < e. 


In practice, the first statement is the one mathematicians use most often. 

The following result is of great importance, saying that the notion of limit 
is well defined: if the limit is something, then it isn’t something else. It could 
be reduced to the one-dimensional case as above, but we will use it as an 
opportunity to play the c, M game in more sober fashion. 

Proposition 1.5.11. If the sequence of points ai ,&2 . . . in R" converges to 
a and to b, then a = b. 


Proof. Suppose a ^ b, and set eo = (|a - b|)/4; our assumption a ^ b implies 
that Co > 0. Thus, by the definition of the limit, there exists M\ such that 
|an - a| < c 0 when n > A/j, and A /2 such that |an - b| < c 0 when n > M 2 . 
Set M = max{Afi, Af 2 }. If n > M, then by the triangle inequality (Theorem 
1.4.9), 


|a - b| = |(a - a„) + (a„ - b)| < |a - a„| + |a„ - b| < 2e 0 = |a - b|/2. 1.5.14 

<co <co 


This is a contradiction, so a = b. □ 


Theorem 1.5.13 states rules concerning limits. First, we need to define a 
bounded set. 

Definition 1.5.12 (Bounded set). A subset X c R n is bounded if it is 
contained in a ball in R n centered at the origin: 

X c B*(0) for some R < oo. 1.5.15 


The ball containing the bounded set can be very big, but its radius must be 
finite. 


Recall that Br denotes a ball of 
radius R\ the ball Br( 0) is a ball 
of radius R centered at the origin. 
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Theorem 1.5.13 (Limits). Let a™, b m be two sequences of points in R n , 
and Cm be a sequence of numbers . Then 

(a) If am and b m both converge, then so does a m + b m , and 

lim (am + b m ) = lim a™ + lim b m . 

m — >oc m—*oo m—* oo 


Illustration for part (d): Let 
Cm = 1/m and a m = 

Then Cm converges to 0, but 
lim (c m a m ) 0. 

m—*oo 

Why is the limit not 0 as in part 
(d)? Because a m is not bounded. 



(b) If am and Cm both converge, then so does Cm*m, and 

lim ( Cyjrj®m ) = ( lim Cm ) ( lim 8m j • 

m— *oo Vm— *oo / \m— » oo / 

(c) If a m and b m both converge, then so does am • b m , and 

lim (am ■ b m ) = ( lim 8m) • ( lim b m ) • 

m—oo \m— *oo / \m-+ oo / 

(d) If am is bounded and Cm converges to 0, then 

lim (Cmfim) = 0* 


Exercise 1.5.13 asks you to 
prove the converse: if every con- 
vergent sequence in a set C C IR n 
converges to a point in C, then C 
is closed. 


We will not prove Theorem 1.5.13, since Proposition 1.5.9 reduces it to the 
one-dimensional case; the proof is left as Exercise 1.5.16. 

There is an intimate relationship between limits of sequences and closed sets: 
closed sets are “closed under limits.” 

Proposition 1.5.14. If x i,X 2 , . . . is a convergent sequence in a closed set 
C C JR n , converging to a point xo 6 R n , then xo 6 C. 


Intuitively, this is not hard to see: a convergent sequence in a closed set can’t 
approach a point outside the set without leaving the set. (But a sequence in a 
set that is not closed can converge to a point of the fence that is not in the set.) 

Proof. Indeed, if xo £ C, then xo 6 (M n - C), which is open, so there exists 
r > 0 such that £ r (xo) C (R" - C). Then for all m we have |x m - xo| > r. 
On the other hand, by the definition of convergence, we must have that for any 
e > 0 we have |x m - xo| < e for m sufficiently large. Taking e = r/2, we see 
that this is a contradiction. □ 


Subsequences 

Subsequences are a useful tool, as we will see in Section 1.6. They are not 
particularly difficult, but they require somewhat complicated indices, which are 
scary on the page and tedious to type. 
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Sometimes the subsequence 

flt(l)>flt(2)> • • • 

is denoted a*j , a* 2 , 


The proof of Proposition 1.5.16 
is left as Exercise 1.5.17, largely 
to provide practice with the lan- 
guage. 


The closure of A is thus A plus 
its fence. If A is closed, then 
A = A. 


Definition 1.5.15 (Subsequence). A subsequence of a sequence oi, 02 , . . . 
is a sequence formed by taking first some element of the original sequence, 
then another element further on, and yet another, yet further along .... It 
is denoted aqi), a,-( 2 ), ■ • • > where i(fc) > i(j) when k > j. 


You might take all the even terms, or all the odd terms, or all those whose 
index is a prime, etc. Of course, any sequence is a subsequence of itself. The 
index i is the function that associates to the position in the subsequence the 
position of the same entry in the original sequence. For example, if the original 
sequence is 


111111 
1* 2’ 3’ 4’ 5’ 6 * “ 

CL 1 CL2 CLa CIS CL 6 


and the subsequence is 

2 4 6 

<*.(1) «i(2) 0.(3) 


we see that i(l) = 2, since 1/2 is the second entry of the original sequence. 
Similarly, i(2) = 4, i(3) = 6, — (In specific cases, figuring out what i(l),i(2), 
etc. correspond to can be a major challenge.) 


Proposition 1.5.16. If a sequence a* converges to a, then any subsequence 
of a* converges to the same limit. 


Limits of functions 

Limits like lim x _ Xo / (x) can only be defined if you can approach Xo by points 
where / can be evaluated. The notion of closure of a set is designed to make 
this precise. 

Definition 1.5.17 (Closure). If A c R n is a subset, the closure of A, 
denoted 71, is the set of all limits of sequences in A that converge in R n . 

For example, if A = (0, 1) then A = [0, 1]; the point 0 is the limit of the 
sequence 1/n, which is a sequence in A and converges to a point in R. 

When Xo is in the closure of the domain of /, we can define the limit of 
a function, lim x _ Xo /(x). Of course, this includes the case when Xo is in the 

domain of /, but the really interesting case is when it is in the boundary of the 
domain. 

Example 1.5.18. (a) If A = (0, 1) then A = [0, 1], so that 0 and 1 are in A. 
Thus, it makes sense to talk about 

lim(l+x) 1/x 1.5.16 

because although you cannot evaluate the function at 0, the natural domain of 
the function contains 0 in its closure. 
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Figure 1.5.5. 

The region in example 1.5.18, 
(c). You can approach the origin 
from this region, but only in rather 
special ways. 

Definition 1.5.19 is not stan- 
dard in the United States but is 
quite common in Prance. The 
standard version substitutes 0 < 
|x - Xol < 6 for our |x - Xo| < S. 
The definition we have adopted 
makes little difference in applica- 
tions, but has the advantage that 
allowing for the case where x = xo 
makes limits better behaved un- 
der composition. With the stan- 
dard version, Theorem 1.5.22 is 
not true. 


A mapping f : 1RL” — * R m is 
an “IR m -valued” mapping; its ar- 
gument is in K n and its values are 
in R m . Often such mappings are 
called “vector-valued” mappings 
(or functions), but usually we are 
thinking of its values as points 
rather than vectors. Note that 
we denote an R m -va!ued mapping 
whose values are points in M m 
with a boldface letter without ar- 
row: f. Sometimes we do want to 
think of the values of a mapping 
lR n — » R n as vectors: when we are 
thinking of vector fields. We de- 
note a vector field with an arrow: 
F or f . 


(b) The point ( q ) is in the closure of 

U = e1k 2 \0<x 2 + y 2 <l} ( the disk w ith the origin removed) 

(c) The point ^ q ) is also in the closure of U (the region between two parabo- 
las touching at the origin, shown in Figure 1.5.5): 

t/ = {(») € ® 2 ' toK* 8 } o 

Definition 1.5.10 (Limit of a function). A function / : U R m has 
the limit a at Xo: 

lim f(x) = a 1.5.17 

X— *X0 

if xo € U and if for all e > 0, there exists 6 > 0 such that when |x-Xo| < S , 
and x€U, then |f(x) - a| < c. 

That is, as / is evaluated at a point x arbitrarily close to Xo, then /(x o) will 
be arbitrarily close to a. 

Since we are not requiring that xo € U y f(x o) is not necessarily defined, but 
if it is defined, then for the limit to exist we must have 

lim f(x) = f(xo). 1.5.18 

X-*Xo 


Limits of mappings with values in R m 

As is the case for sequences (Proposition 1.5.9), it is the same thing to claim 
that an R m -valued mapping f : R n — ► IR m has a limit, and that its components 
have limits, as shown in Proposition 1.5.20. Such a mapping is sometimes 
written in terms of the “sub-functions’ 1 (coordinate functions) that define each 
new coordinate. For example, the mapping f : R 2 — ► R 3 , 



where /i(x) = xy, / 2 (x) = x 2 y, and / 3 (x) = x -y. 
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Recall (Definition 1.5.17) that 
U denotes the closure of U : the 
subset of K n made up of the set of 
all limits of sequences in U which 
converge in R n . 


If you gave efy/m to your team- 
mates, as in the proof of Proposi- 
tion 1.5.9, you would end up with 

|f(x) - a| < «, 

rather than (f(x) - a| < 6y/m. In 
some sense this is more “elegant.” 
But Theorem 1.5.10 says that it is 
mathematically just as good to ar- 
rive at less than or equal to epsilon 
times some fixed number or, more 
generally, anything that goes to 0 
when c goes to 0. 


Proposition 1.5.20. Let f(x) = 


/l(x) 


/m(x) 


be a function defined on a do- 


main I/CR", and let xo € R n be a point in U. Then lim x — xo f = a exists 
if and only if each of lim x _»xo /» = a » exists, and 


lim f sb 

X—*Xo 


( limx-oco fi 
lnHx-*xo f m 



1.5.20 


Proof. Let’s go through the picturesque description again. The proof has an 
“iP part and an “only if’ part. 

For the “if’ part, the challenger hands you an e > 0. You pass it on to a 
teammate who returns a 6 with the guarantee that when |x — Xo| < 6, and 
f(x) is defined, then |f(x) - a| < e. You pass on the same 6, and a to the 
challenger, with the explanation: 


|/i(x) - a»| < |f(x) - a| < c. 1.5.21 

For the “only if’ part, the challenger hands you an e > 0. You pass this e to 
your teammates, who know how to deal with the coordinate functions. They 
hand you back 6 \, . . . ,6 m . You look through these, and select the smallest one, 
which you call <5, and pass on to the challenger, with the message 

“If |x - xo| < <5, then |x» - (xo)*| < 6 < <$», so that |/»(x) - a*| < e, so that 
|/(x) -a| = ^/(/i(x) - ai) 2 H h(/ m (x) - a m ) 2 < y/c 2 + - • + c 2 = €\/m, 

m terms 

1.5.22 

which goes to 0 as e goes to 0. You win! 

Theorem 1.5.21 (Limits of functions). Let f and g be functions from 
U — ► R m , and h a function from U — ► jR. 

(a) If lim x _>x 0 f(x) and lim*^ g(x) exist, then lim x ->xo(f +g)(x) exists , 
and 

lim f(x)+ lim g(x) = lim(f + g)(x). 1.5.23 

X-OCo X-»Xo X— »Xo 

(b) If lim x _xo f(x) and lim*^^ h(x) exist, then lim x _> Xo hf (x) exists , 
and 

lim h(x) lim f(x) = lim fif(x). 

X^Xo 


1.5.24 
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(c) If lim x — xo f(x) exists, and lim x ^xo M*) exists and is different from 0, 
then lim x _* 0 (£)(x) exists, and 


lim x ^ Xo f(x) 
lim x — h(x) 


= js. ( i ) 


(*) 


1.5.25 


(d) If linix—jto f(x) and linix^ g(x) exist , then so does • g), 

and 


lim f(x) • lim g(x) = lim (f*g)(x). 

X— Xo X— Xo x— Xo 

(e) Iffis hounded and lim x — ^ h(x) = 0, then 

lim (hf)(x) = 0. 

X— *Xo 

(f) If limx-o^ f(x) =0 and h(x) is bounded, then 

lim (hf)(x) = 0. 

x— »xo 


1.5.26 


1.5.27 


1.5.28 


We could substitute 

’ 7 7 — - r for c 

2(|g(x 0 )| + €) 

in Equation 1.5.29, and 
€ f 
2(|f(xo)| + e) " € 

in Equation 1.5.30. This would 
give 

|f(x) • g(x) - f(xo) • g(xo)| 

< €|g( x )l ■ €lf(xp)l 

2(|g(xo)| + c) 2(|f(xo)| + «) 

< €. 

Again, if you want to land exactly 
on epsilon, fine, but mathemati- 
cally it is completely unnecessary. 


Proof. The proofs of all these statements are very similar; we will do only (d), 
which is the hardest. Choose c (think of the challenger giving it to you). Then 

(1) Find a <$i such that when |x - x 0 | < <5i, then 

|g(x) -g(x 0 | < c. 1.5.29 

(2) Next find a 62 such that when |x — xo| < £2, then 

|f(x) - f(x 0 )| < €. 1.5.30 

Now set S to be the smallest of 61 and 62 , and consider the sequence of inequal- 
ities 

|f(x) • g(x) - f(xo) • g(xo)| 

= |f(x) • g(x) -f(xo) • g(x) + f(xo) • g(x) -f(x 0 ) • g(xo)| 

N — - v - — ' 

=0 

< |f(x) • g(x) - f (xo) • g(x)| + |f(xo) • g(x) - f(xo) • g(xo)| 1.5.31 
= |(f(x) - f(xo)) • g(x)| + |f(xo) • (g(x) - g(xo))| 

< |(f(x) - f(xo))| |g(x)| + |f(xo)| |(g(x) - g(xo))| 

< «lg(x)| + «|f(xo)| = <(|g(x)| + |f(xo)|). 

Now g(x) is a function, not a point, so we might worry that it could get big faster 
than c gets small. But we know that when |x-xo| < <5, then |g(x) -g(xo)| < c, 
which gives 


|g(x)| < € + |g(xo)|. 


1.5.32 
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So continuing Equation 1.5.31, we get 

e(|g(x)| + |f( x o)|) < c ( c + lg( x o)i) + €|f(*o)|, 15<33 

which goes to 0 as € goes to 0. □ 

Limits also behave well with respect to compositions. 

Theorem 1.5.22 (Limit of a composition). If U C R n , V C R m are 
subsets, and f : U -> V and g : V -► R* are mappings, so that g o f 
is defined, and if y 0 d = lim*-,*, f (x), and lim y - yo g(y) both exist , then 
lim x 

— *0 exists , and 

There is no natural condition . 

that will guarantee that x^ g ° f(x) = L534 

f (x) ^ f(xo); 

if we had required x ^ xo in our Proof. For all e > 0 there exists such that if |y - yo| < <5i, then |g(y) - 
definition of limit, this argument g(y 0 )j < €. Next, there exists S such that if |x-xo| < <5, then |f(x)— f(xo)| < <$i. 
would not work. Hence 

|g(f (x)) -g(f(xo))| < e when |x-xo| < <5- □ 1.5.35 

Theorems 1.5.21 and 1.5.22 show that if you have a function / : W 1 — ► R 
given by a formula involving addition, multiplication, division and composition 
of continuous functions, and which is defined at a point xo, then lim x — Xo /(x) 
exists, and is equal to /(x o). 

Example 1.5.23 (Limit of a function). We have 

lim x 2 sin(a:y) = 3 2 sin(-3) ~ -1.27 1.5.36 

0M-?) 

In fact, the function x 2 sin(xy) has limits at all points of the plane, and the 
limit is always precisely the value of the function at the point. Indeed, xy is the 
product of two continuous functions, as is x 2 , and sine is continuous at every 
point, so sin(xy) is continuous everywhere; hence also x 2 sin(xy). A 

In Example 1.5.23 we just have multiplication and sines, which are pretty 
straightforward. But whenever there is a division we need to worry: are we 
dividing by 0? We also need to worry whenever we see tan: what happens 
if the argument of tan is ir/2 + kn? Similarly, log, cot, sec, esc all introduce 
complications. 

In one dimension, these problems are often addressed using l’Hopital’s rule 
(although Taylor expansions often work better). 

Much of the subtlety of limits in higher dimensions is that there are lots 
of different ways of approaching a point, and different approaches may yield 
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different limits, in which case the limit may not exist. The following example 
illustrates some of the difficulties. 


Example 1.5.24 (A case where different approaches give different lim- 
its). Consider the function 



if x ^0 
if x = 0 , 


1.5.37 


shown in Figure 1.5.6. Does lim^^ / (y) exist? 



A first idea is to approach the origin along straight lines. Set y = mi. When 
m = 0 , the limit is obviously 0 , and when m / 0 , the limit becomes 


lim 

x —*0 


m 

x 


H*l: 


1.5.38 


this limit exists and is always 0, for all values of m. Indeed, 

lim = lim 3 - = 0 . 

t-*o t 8—<x> e* 


1.5.39 


Figure 1.5.6. So however you approach the origin along straight lines, the limit always exists, 

The function of Example 1.5.24 and is always 0. But if you set y = kx 2 and let x — ► 0, approaching 0 along a 
is continuous except at the on- parabola, you find something quite different: 
gin. Its value is 1/e along the 

“crest line” y = ±x 2 , but van- = l^l e > 1.5.40 

ishes on both axes, forming a very . „ , , __ , , „ ^ _ 

deep canyon along the z-axis. If which 48 80me number that vanes between 0 and 1/e (see Exercise 1.5.18). Thus 

you approach the origin along any if y° u approach the origin in different ways, the limits may be different. A 

straight line y = mx with m 

o, the path will get to the broad Continuous functions 
valley along the y-axis before it 

reaches the origin, so along any Continuity is the fundamental notion of topology, and it arises throughout 
such path the limit of / exists and calculus also. It took mathematicians 200 years to arrive at a correct definition, 
is 0 . (Historically, we have our presentation out of order: it was the search for a 

usable definition of continuity that led to the correct definition of limits.) 


Definition 1.5.25 (Continuous function). Let X C R n . Then a mapping 
f : X — ► R m is continuous at xo € X if 

•im f(x) = f(*o); 

X— *Xo 

f is continuous on X if it is continuous at every point of X. 


1.5.41 



A map f is continuous at xo if 
you can make the difference be- 
tween f(x) and f(xo) arbitrarily 
small by choosing x sufficiently 
close to xo- Note that |f(x) - 
f(xo)| must be small for all x “suf- 
ficiently close" to xo- It is not 
enough to find a S such that for 
one particular value of x the state- 
ment is true. However, the “suffi- 
ciently close” (i.e., the choice of 6) 
can be different for different val- 
ues of x. (If a single S works for all 
x, then the mapping is uniformly 
continuous.) 

We started by trying to write 
this in one simple sentence, and 
found it was impossible to do so 
and avoid mistakes. If defini- 
tions of continuity sound stilted, 
it is because any attempt to stray 
from the “for all this, there exists 
that...” inevitably leads to ambi- 
guity. 


Note that with the definition of 
limit we have given, it would be 
the same to say that a function 
f : U —* R m is continuous at 
Xo € U if and only if lim x _ Xo f(x) 
exists. 
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There is a reformulation in terms of epsilons and deltas: 

Proposition 1.5.26 (Criterion for continuity). The map f : X — i ► R m 
is continuous at Xo if and only if for every e > 0, there exists 6 > 0 such that 
when |x - xo| < 6, then |f(x) - /(xo)| < 

Proof. Suppose the e,6 condition is satisfied, and let x*, i = 1,2, ... be a 
sequence in X that converges to x 0 6 X. We must show that the sequence 
f(x,), i = 1,2, . . . converges to f(xo), i.e., that for any c > 0, there exists N 
such that when n > N we have |f(x) — f(xo)| < c. To find this iV, first find 
the S such that |x - xo| < 6 implies that |f(x) - f(xo)| < c. Next apply the 
definition of a convergence sequence to the sequence x»: there exists N such 
that if n > AT, then |x n - xo| < <5. Clearly this N works. 

For the converse, remember how to negate sequences of quantifiers (Section 
0.2). Suppose the e,S condition is not satisfied; then there exists cq > 0 suc b 
that for all S, there exists x 6 X such that |x - Xo| < 6 but |f(x) - f(xo)| > co- 
Let S n = 1/n, and let x n e X be such a point; i.e., 

1 

|x n - Xo| < - and |f(x„) - f(xo)| > co- 1.5.42 

n 

The first part shows that the sequence x„ converges to xo, and the second part 
shows that f(Xn) does not converge to f(xo). □ 

The following theorem is a reformulation of Theorem 1.5.21; the proof is left 
as Exercise 1.5.19. 

Theorem 1.5.27 (Combining continuous mappings). Let U be a sub- 
set of R n , f and g mappings U —* R m , and h a function U — ► R. 

(a) If f and g sure continuous at xo, then so is f + g. 

(b) Iff and h are continuous at xo, then so is hf. 

(c) Iff and h are continuous at Xo, and h(x o) ^ 0, then then so is fa. 

(d) If f and g are continuous at Xq, then so is f • g 

(e) If f is bounded and h is continuous at Xo, with h(x o) = 0, then hf is 
continuous at Xo- 

We can now write down a fairly large collection of continuous functions on 
R n : polynomials and rational functions. 

A monomial function on R n is an expression of the form . . . x* n with 
integer exponents ki > 0. For instance, x 2 yz s is a monomial on R 3 , 
and xix 2 x\ is a monomial on R 4 (or perhaps R n with n > 4). A polynomial 
function is a finite sum of monomials with real coefficients, like x 2 y + 3yz. A 
rational function is a ratio of two polynomials, like x+v s . 



1.5 Convergence and Limits 87 



} 

\ 

I 

Figure 1.5.7. 

A convergent series of vectors. 
The fcth partial sum is gotten by 
putting the first k vectors nose to 
tail. 


Absolute convergence means 
that the absolute values converge. 


Proposition 1.5.30 is very im- 
portant; we use it in particular 
to prove that Newton’s method’s 
converges. 


Corollary 1.5.28 (Continuity of polynomials and rational functions). 

(a) Any polynomial function K n — ♦ R is continuous on all of W 1 . 

(b) Any rational function is continuous on the subset of B* n where the 
denominator does not vanish. 

Series of vectors 

As is the case for numbers (Section 0.4), many of the most interesting sequences 
arise as partial sums of series. 

Definition 1.5.29 (Convergent series of vectors). A series m 

convergent if the sequence of partial sums 

n 

s„ = ]Tai 1.5.43 

1=1 

is a convergent sequence of vectors. In that case the infinite sum is 

oo 

E a i = lim &„. 1.5.44 

n—*oo 

*=1 

Proposition 1.5.30 (Absolute convergence implies convergence). 

OO oo 

If | a* | converges, then ^Pa* converges. 

*=i *=i 

Proof. Proposition 1.5.9 says that it is enough to check this component by 
component; in one variable, it is a standard statement of elementary calculus 
(Theorem 0.4.11). □ 

Geometric series of matrices 

When he introduced matrices, Cayley remarked that square matrices “comport 
themselves as single quantities.” In many ways, one can think of a square 
matrix as a generalized number; many constructions possible with numbers are 
also possible with matrices. Here we will see that a standard result about the 
sum of a geometric series applies to matrices as well; we will need this result 
when we discuss Newton’s method for solving nonlinear equations, in Section 
2.7. In the exercises we will explore other series of matrices. 

Definitions 1.5.8 and 1.5.29 apply just as well to matrices as to vectors, since 
when distances are concerned, if we denote by Mat (n,m) the set ofnxm 
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matrices, then Mat (n,m) is the same as M nm . In particular, Proposition 1.5.9 
applies: a series 

oo 

"^Ai, 1.5.45 

fc= 1 


Example: Let 


A = 


0 

0 


1/4 

0 


Then 


A 2 = 0 (surprise!), so that the 
infinite series of Equation 1.5.48 
becomes a finite sum: 


{1 -Ay 1 = I + A, 


and 


1 —1/4 

-i 

1 1/4 

.0 1 


0 1 


of n x m matrices converges if for each position (i, j) of the matrix, the series 
of the entries (i4 n )( ij ) converges. 

Recall (Example 0.4.9) that the geometric series S = a + ar + ar 2 4- ■ ■ ■ 
converges if |rj < 1, and that the sum is a/(l - r). We want to generalize this 
to matrices: 

Proposition 1.5.31. Let A be a square matrix. If\A\ < 1, the series 

S = I + A + A 2 + • • ■ 1.5.48 

converges to (7 - A)~ l . 

Proof. We use the same trick used in the scalar case of Example 0.4.9. Denote 
by Sk the sum of the first k terms of the series, and subtract from Sk the product 
S k A , to get 5/(7 - A) = I- A k+l : 

Sk = I +■ A + A 2 + A* + • * • + A k 
S k A = A + A 2 + A 3 + • • • + A k + A k+l i c aq 


S k (I -A) = I 


- A k+l 


We will see in Section 2.3 that if 
a square matrix has either a right 
or a left inverse, that inverse is 
necessarily a true inverse; check- 
ing both directions is not actually 
necessary. 


We know (Proposition 1.4.11 b) that 

M* +, |<M|*MI = MI* + ‘. 1-5.50 

so Hindoo 4* +1 = 0 when |.4| < 1, which gives us 

S(l -A)= lim Sk(I - A) = lim (/ - ^* +1 ) = / - lim ^1* +1 = 7. 1.5.51 

*—►00 k—>oo ' k— *oc 

0 

Since S(I - A) - /, 5 is a left inverse of (7 - A). If in Equation 1.5.49 we 
had written ASk instead of SkA, the same computation would have given us 

(7 - A)S = 7, showing that 5 is a right inverse. So by Proposition 1.2.14, S is 
the inverse of (1 - j4). □ 

Corollary 1.5.32. If |i4| < 1, then (I — A) is invertible. 

Corollary 1.5.33. The set of invertible n x n matrices is open. 

Proof. Suppose B is invertible, and |77| < 1/| B~ l \. Then | - B~ l H\ < 1, so 
7 + 5 *77 is invertible (by Corollary 1.5.32), and 

(7 + B~ l H)~ l B~ l = (B(7 + B" 1 H))~ 1 = (B + 77) _1 . 


1.5.52 
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When they were discovered, the 
examples of Peano and Cantor 
were thought of as aherrations. “I 
turn with terror and horror from 
this lamentable scourge of contin- 
uous functions with no derivatives 
. . . wrote Charles Hermite in 
1893. Six years later, the French 
mathematician Henri Poincare 
lamented the rise of “a rahhle of 
functions . . . whose only job. it 
seems, is to look as little as pos- 
sible like decent and useful func- 
tions.” 

“What will the poor student 
think?” Poincare worried. “He 
will think that mathematical sci- 
ence is just an arbitrary heap of 
useless subtleties; either he will 
turn from it in aversion, or he will 
treat it like an amusing game.” 

Ironically, although Poincare 
wrote that these functions, “spe- 
cially invented only to show up the 
arguments of our fathers,” would 
never have any other use, he was 
ultimately responsible for show- 
ing that seemingly “pathological” 
functions are essential in describ- 
ing nature, leading to such fields 
as chaos and fractals. 


Definition 1.6.1 is amazingly 
important, invading whole chap- 
ters of mathematics; it is the hasic 
“finiteness criterion” for spaces. 
Something like half of mathemat- 
ics consists of showing that some 
space is compact. 


Thus if |//| < the matrix B + H is invertible, giving an explicit 

neighborhood of B made up of invertible matrices. □ 

.6 Four Big Theorems 

In this section we describe a number of results, most only about 100 years 
old or so. They are not especially hard, and were mainly discovered after vari- 
ous mathematicians (Peano, Weierstrass, Cantor) found that many statements 
earlier thought to be obvious were in fact false. 

For example, the statement a curve in the plane has area 0 may seem obvi- 
ous. Yet it is possible to construct a continuous curve that completely fills up a 
triangle, visiting every point at least once! The discovery of this kind of thing 
forced mathematicians to rethink their definitions and statements, putting cal- 
culus on a rigorous basis. 

These results are usually avoided in first and second year calculus. Two 
key statements typically glossed over are the mean value theorem and the inte- 
grability of continuous functions. These are used — indeed, they are absolutely 
central — but often they are not proved. 15 In fact they are not so hard to prove 
when one knows a bit of topology: notions like open and closed sets, and max- 
ima and minima of functions, for example. 

In Section 1.5 we introduced some basic notions of topology. Now we will 
use them to prove Theorem 1.6.2, a remarkable non-constructive result that will 
enable us to prove the existence of a convergent subsequence without knowing 
where it is. We will use this theorem in crucial ways to prove the mean value 
theorem and the fundamental theorem of algebra (this section), to prove the 
spectral theorem for symmetric matrices (Theorem 3.7.12) and to see what 
functions can be integrated (Section 4.3). 

In Definition 1.6.1 below, recall that a subset X C ffi' 1 is bounded if it is 
contained in a ball centered at the origin (Definition 1.5.12). 

Definition 1.6.1 (Compact set). A subset C c lR n is compact if it is 
closed and bounded. 

The following theorem is as important as the definition, if not more so. 

Theorem 1.6.2 (Convergent subsequence in a compact set). If a 

compact set C c R n contains a sequence Xi,X 2 , . . . , then that sequence has 
a convergent subsequence xqj), Xj( 2 ), . . . whose limit is in C. 

Note that Theorem 1.6.2 says nothing about what the convergent subse- 
quence converges to; it just says that a convergent subsequence exists. 

15 One except ioii is Michael Spivak’s Calculus. 
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) 


Figure 1.6.1. 

If the large box contains an infi- 
nite sequence, then one of the four 
quadrants must contain a conver- 
gent subsequence. If that quad- 
rant is divided into four smaller 
boxes, one of those small boxes 
must contain a convergent subse- 
quence, and so on. 


Several more properties of com- 
pact sets are stated and proved in 
Appendix A. 17. 


Proof. The set C is contained in a box -10* < x t < 10* for some N. 
Decompose this box into boxes of side 1 in the obvious way. Then at least one 
of these boxes, which we’ll call Bo, must contain infinitely many terms of the 
sequence, since the sequence is infinite and we have a finite number of boxes. 
Choose some term x^o) in #o> and cut U P &o into 10 n boxes of side 1/10 (in 
the plane, 100 boxes; in & 3 , 1,000 boxes). At least one of these smaller boxes 
must contain infinitely many terms of the sequence. Call this box B\, choose 
Xj(i) € B\ with t(l) > i(0). Now keep going: cut up B\ into 10 n boxes of 
side 1/10 2 ; again, one of these boxes must contain infinitely many terms of 
the sequence; call one such box B 2 and choose an element x^ 2 ) € B 2 with 
i(2) > t(l) ... 

Think of the first box Bo as giving the coordinates, up to the decimal point, 
of all the points in Bo- (Because it is hard to illustrate many levels for a decimal 
system, Figure 1.6.1 illustrates the process for a binary system.) The next box, 
Bi, gives the first digit after the decimal point. 16 Suppose, for example, that 
Bo has vertices (1,2), (2,2), (1,3) and (2,3); i.e., the point (1,2) has coordinates 
x = 1, y = 2, and so on. Suppose further that B\ is the small square at the top 
right-hand comer. Then all the points in B\ have coordinates (x = 1.9 . . . ,y = 
2.9...). When you divide B\ into 10 2 smaller boxes, the choice of B 2 will 
determine the next digit; if B 2 is at the bottom right-hand corner, then all 
points in B 2 will have coordinates (x = 1.99 . . . , y = 2.90 . . . ), and so on. 

Of course you don’t actually know what the coordinates of your points are, 
because you don’t know that B\ is the small square at the top right-hand corner, 
or that B 2 is at the bottom right-hand corner. All you know is that there exists 
a first box Bo of side 1 that contains infinitely many terms of the original 
sequence, a second box B\ 6 Bo of side 1/10 that also contains infinitely many 
terms of the original sequence, and so on. 

Construct in this way a sequence of nested boxes 

B 0 D Bi D B 2 D ... 1.6.1 

with B m of side 10“ m , and each containing infinitely many terms of the se- 
quence; further choose x i(m) € B m with i(m + 1) > t(m). 

Clearly the sequence x i(m) converges; in fact the mth term beyond the dec- 
imal point never changes after the mth choice. The limit is in C since C is 
closed. □ 

You may think “what’s the big deal?” To see the troubling implications of 
the proof, consider Example 1.6.3. 

Example 1.6.3 (Convergent subsequence). Consider the sequence 

x m = sin I0 m . 1.6.2 

To ensure that all points in the same box have the same decimal expansion, we 
should say that our boxes are all open on the top and on the right. 
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Remember (Section 0.3) that U 
means “union”: A\JB is the set of 
elements of either A or B or both. 



Figure 1.6.2. 


Graph of sin27rx. If the frac- 
tional part of a number x is be- 
tween 0 and 1/2, then sin 27 rx > 0; 
if it is between 1/2 and 1, then 
sin 27rx < 0. 


This is certainly a sequence in the compact set C — [—1,1], so it 
a convergent subsequence. But how do you find it? The first step ot the 
construction above is to divide the interval [—1, 1] into three (mur 

“boxes”), writing 

[-1,1] = [1,0) u [0,1) u {1}. 

Now how do we choose which of the three “boxes” above shouil\^e first 1 
box Bo? We know that x m will never be in the box {1}, since sin# ^i^^B*** 
only if 0 = 7r/2 4- 2A:7 t for some integer k and (since tt is irrational) 10 m cannot 
be 7r/2 4- 2kir. But how do we choose between (—1,0) and [0, 1)? If we want 
to choose (0, 1), we must be sure that we have infinitely many positive x m . So, 
when is x m = sin 10 m positive? 

Since sin# is positive for 0 < 6 < 7r, then x m is positive when the fractional 
part of 10 m /(27r) is greater than 0 and less than 1/2. ( By “fractional part” 
we mean the part after the decimal; for example 5/3 = 1 + 2/3 = 1.666. . .; 
the fractional part is .666 — ) If you don’t see this, consider that (as shown in 
Figure 1.6.2) sin27ror depends only on the fractional part of or. 


( — 


sin 27 tq < 


= 0 if a is an integer or half-integer 


>0 if the fractional part of q is < 1/2 
^ < 0 if the fractional part of a is > 1/2 
If instead of writing x m = sin 10 m we write 


x m = sin 27 r 


10 m 
2tt ’ 


i.e. a = 


10 


m 


2tt ’ 


1.6.4 


1.6.5 


we see, as stated above, that x m is positive when the fractional part of 10 m /(27r) 
is less than 1/2. 

So if a convergent subsequence of x m is contained in the box [0, 1), an infinite 
number of 10 m / (27r) must have a fractional part that is less than 1/2. This will 
ensure that we have infinitely many x m = sin 10 m in the box [0, 1). 

For any single x m , it is enough to know that the first digit of the fractional 
part of I0 m /(27r) is 0, 1, 2, 3 or 4: knowing the first digit after the decimal 
point tells you whether the fractional part is less than or greater than 1 /2. Since 
multiplying by 10 m just moves the decimal point to the right by m, knowing 
whether the fractional part of every 10 m /(27r) starts this way is really a question 
about the decimal expansion of do the digits 0,1, 2, 3 or 4 appear infinitely 
many times in the decimal expansion of 


^ = .1591549...? 1.6.6 

Note that we are not saying that all the 10 m /(27r) must have the decimal 
point followed by 0,1, 2, 3, or 4! Clearly they don’t. We are not interested in 
all the x m \ we just want to know that we can find a subsequence of x m that 



The point, is that although the 
sequence x m = sin 10 m is a se- 
quence in a compact set, and 
therefore (by Theorem 1.6.2) con- 
tains a convergent subsequence, 
we can't begin to “locate'’ that 
subsequence. We can’t even say 
whether it is in (—1,0) or [U, 1). 


Recall that compact means 
closed and bounded. 


Although the use of the words 
least upper bound and sup is com- 
pletely standard, some people use 
maximum as another synonym for 
least upper bound, not a least up- 
per bound that is achieved, as 
we have defined it. Similarly, 
some people use t he words greatest 
lower bound and minimum inter- 
changeably; we do not. 
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converges to something inside the box [0, 1). For example, :i\ is not in the box 
[0,1), since 10 x .1591549... = 1.591549...: the fractional part starts with 5. 
Nor is x 2 , since 10 2 x .1591549 . . . = 15.91549 . . . ; the fractional part starts witli 
9. But £3 is in the box [0. 1), since the fractional part of 10 3 x .1591549. . . = 
159.1549 . . . starts with a. 1 . 

Everyone believes that the digits 0,1, 2, 3 or 4 appear infinitely many t imes in 
the decimal expansion of ^ : it is widely believed that n is a normal number. 
i.e., where every digit appears roughly 1 / 10th of the time, every pair of digits 
appears roughly l/100th of the time, etc. The first 4 billion digits of ?r have 
been computed and appear to bear out this conjecture. Still, no one knows how 
to prove it; as far as we know it is conceivable that all the digits after the 10 
billionth are 6 : s, 7's and 8's. 

Thus, even choosing the first “box 15 Bq requires some god-like ability to “see" 
this whole infinite sequence, when there is simply no obvious way to do it. A 

Theorem 1.6.2 is non- constructive: it proves that something exists but gives 
not the slightest hint of how to find it. Many mathematicians of the end of 
the 19th century were deeply disturbed hv this type of proof; even today, a 
school of mathematicians called the intuitionists reject this sort of thinking. 
They demand that in order for a number to be determined, one give a rule 
which allows the computation of the successive decimals. Intuitionists are pretty 
scarce these days: we have never met one. But we have a certain sympathy 
with their views, and much prefer proofs that involve effectively computable 
algorithms, at least implicitly. 

Continuous functions on compact sets 

We can now explore some of the consequences of Theorem 1.6.2. 

One is that a continuous function defined on a compact subset has both a 
maximum and a minimum. Recall from first year calculus and from Section 
0.4 the difference between a least upper bound and a maximum (similarly, the 
difference between a gi'eatest. lower bound and a minimum). 

Definition 1.6.4 (Least upper bound). A number x is the least upper 
bound of a function / defined on a set C if x is the smallest number such 
that f(a) < x for all a 6 C. It is also called supremum , abbreviated sup. 

Definition 1.6.5 (Maximum). A number x is the maximum of a function 
/ defined on a set C if it is the least upper bound of / and there exists b € C 
such that f(b) = x. 
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For example, on the open set. (0, 1) the least upper bound of f(x) = x 2 is 
1, and / has no maximum. On the closed set [0, 1], 1 is both the least upper 
bound and the maximum of /. 


On the open set (0, 1) the great- 
est lower bound of f(x) = x 2 is 
0. and / has no minimum. On 
the closed set [0, 1], 0 is both the 
greatest lower bound and the min- 
imum of /. 

Recall that “compact" means 
closed and bounded. 


Definition 1.6.6 (Greatest lower bound, minimum). A number y is 
the greatest lower bound of a function / defined on a set C if y is the largest 
number such that f(a)>x for all a € C. The word infimum , abbreviated 
in/, is a synonym for greatest lower bound. The number y is the minimum 
of / if there exists b € C such that f(b) = y. 

Theorem 1.6.7 (Existence of minima and maxima). Let C C M n be a 
compact subset , and f : C — > R be a continuous function . Then there exists 
a point a € C such that /(a) > /(x) for all x € C, and a point b € C such 
that /(b) < /(x) for all x € C. 

Here are some examples to show that the conditions in the theorem are 
necessary. Consider the function 

{ 0 when x = 0 

I otherwise, L6 ' 7 

defined on the compact set [0,1]. As x — ► 0, we see that f(x) blows up to 
infinity; the function does not have a maximum (it is not bounded). This 
function is not continuous, so Theorem 1.6.7 does not apply to it. 

The function f(x) = 1/x, defined on (0,1], is continuous but it has no 
maximum either; this time the problem is that (0, 1] is not closed, hence not 
compact . And the function f(x) = x , defined on all of R, is not bounded either; 
this time the problem is that E is not bounded, hence not compact. Exercise 
1.6.1 asks you to show that if A C M n is any non-compact subset, then there 
always is a continuous unbounded function on A. 


Proof. The proof is by contradiction. Assume / is unbounded. Then for 
any integer N, no matter how large, there exists a point x^ € C such that 
|/(x,v)| > N. By Theorem 1.6.2, the sequence x/v must contain a convergent, 
subsequence x^), which converges to some point b € C. Since / is continuous 
at b, then for any e, there exists a <5 > 0 such that when |x - b| < <5, then 
|/(x) - /(b)| < f; i.e., |/(x)|<|/(b)| + 6. 

Since the x jV( j) converge to b, we will have - b| < <5 for j sufficiently 
large. But as soon as N(j) > |/(b)| + e, we have 

l/(x.vO))l > N U) > l/(b)| + e, 1.6.8 

a contradiction. 

Therefore, the set of values of / is bounded, which means that / has a least 
upper bound M. What we now want to show is that / has a maximum: that 
there exists a point a € C such that /(a) = M. 
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There is a sequence x t such that 

lim /(xj) = M. 

*—►00 

We can again extract a convergent subsequence x i(m ) that converges to some 
point a 6 C. Then, since a = limm-.o© x^ m ), 

/(a) = lim /(x i(m) ) = M. 1.6.10 

TTI— »00 

The proof for the minimum works the same way. □ 

We will have several occasions to use Theorem 1.6.7. First, we need the 
following proposition, which you no doubt proved in first year calculus. 

Proposition 1.6.8. If a function g defined and differentiable on an open in- 
terval in R has a maximum (respectively a minimum) at c, then its derivative 
at c is 0. 


Proof. We will prove it only for the maximum. If g has a maximum at c, then 
p(c) - g(c + h)> 0, so 


g(c) - g{c + h) ( >0 
h \ <0 


if h>0 
if h < 0; 


i.e., 


lim 

h— 0 


g(c) - g(c + h) 
h 


1.6.11 


Stated in terms of cars, the 
mean value theorem may seem ob- 
vious. But notice that the theo- 
rem does not require that the deriv- 
ative be continuous : even if it were 
possible for a car to jump from go- 
ing 59 mph to going 61 mph, with- 
out ever passing through 60 mph, 
it would still be true that a car 
that traveled 60 miles in an hour 
would have at some instant to be 
going 60 mph. 


is simultaneously < 0 and > 0, so it is 0. □ 

An essential application of Theorem 1.6.7 and Proposition 1.6.8 is the mean 
value theorem, without which practically nothing in differential calculus can be 
proved. The mean value theorem says that you can’t drive 60 miles in an hour 
without going exactly 60 mph at one instant at least: the average change in / 
over the interval (o,6) is the derivative of / at some point c € (a, 6). 

Theorem 1.6.9 (Mean value theorem). If/: [o,6]-»]Ris continuous , 
and f is differentiable on (a, b), then there exists c € (a, b) such that 

/'(c) = -W) ~ M , 1 . 6.12 

o — o 


Note that / is defined on the closed and bounded interval [o, 6), but we must 
specify the open interval (o,6) when we talk about where / is differentiable. 17 
If we think that / measures position as a function of time, then the right-hand 
side of Equation 1.6.12 measures average speed over the time interval b - a. 

,7 One could have a left-hand and right-hand derivative at the endpoints, but we 
are not assuming that such one-sided derivatives exist. 
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i 

Figure 1.6.3. 

A race between hare and tor- 
toise ends in a dead heat. The 
function / represents the progress 
of the hare, starting at time a and 
ending at time b. He speeds ahead, 
overshoots the mark, and returns. 
Slow-and-steady tortoise is repre- 
sented by g(x) = /(a) 4- m(x - a). 


Even if the coefficients a n are 
real, the fundamental theorem of 
algebra does not guarantee that 
the polynomial has any real roots; 
the roots may be complex. 


Proof, Think of a function / as representing distance traveled (by a car or, as 
in Figure 1.6.3, by a hare). The distance the hare travels in the time interval 
b - a is f(b) - /(a), so its average speed is 


m 


m - m 

b - a 


1.6.13 


The function g represents the steady progress of a tortoise starting at f{a) and 
constantly maintaining that average speed (alternatively, a car set on cruise 
control): 


</(x) = /(a) + m(x - a). 


The function h measures the distance between / and g: 

h(x) = f(x) - g{x) = f(x) - (/(a) 4- m(x - a)). 1.6.14 

It is a continuous function on [a, 6), and h(a) = h(b) = 0. (The hare and the 
tortoise start together and finish in a dead heat.) 

If h is 0 everywhere, then f(x) — g(x ) = /(a) 4- m(x - a) has derivative m 
everywhere, so the theorem is true. 

If h is not 0 everywhere, then it must take on positive values or negative 
values somewhere, so it must have a positive maximum or a negative minimum, 
or both. Let c be a point where it has such an extremum; then c € (a, 6), so h 
is differentiable at c, and by Proposition 1.6.8, h'(c) = 0. 

This gives 0 = h’(c) = /'(c) - m. (In Equation 1.6.14, x appears only twice; 
the f(x) contributes /'(c) and the -mx contributes -m.) □ 


The fundamental theorem of algebra 

The fundamental theorem of algebra is one of the most important results of 
all mathematics, with a history going back to the Greeks and Babylonians. It 
was not proved satisfactorily until about 1830. The theorem asserts that every 
polynomial has roots. 

Theorem 1.6.10 (Fundamental theorem of algebra). Let 

p{z) ~ z* 4- 4- • * • 4- flo 1.6.15 

be a polynomial of degree k > 0 with complex coefficients. Then p has a 
root; there exists a complex number zq such that p(zo) - 0. 

When k =■ 1, this is clear: the unique root is zo = -ao- 

When k — 2, the famous quadratic formula tells you that the roots are 

-ai ± s/a\ - 4ao 
2 


1.6.16 



Niels Henrik Abel, born in 
1802, assumed responsibility for a 
younger brother and sister after 
the death of their alcoholic father 
in 1820. For years he struggled 
against poverty and illness, trying 
to obtain a position that would al- 
low him to marry his fiancee; he 
died from tuberculosis at the age 
of 26, without learning that he 
had been appointed professor in 
Berlin. 

Evariste Galois, born in 1811, 
twice failed to win admittance to 
Ecole Polytechnique in Paris, the 
second time shortly after his fa- 
ther’s suicide. Ill 1831 he was 
imprisoned for making an implied 
threat against the king at a repub- 
lican banquet; he was acquitted 
and released about a month later. 
He was 20 years old when he died 
from wounds received in a duel. 

At the time Gauss gave his 
proof of Theorem 1.6.10, complex 
numbers were not sufficiently re- 
spectable that they could be men- 
tioned in a rigorous paper: Gauss 
stated his theorem in terms of real 
polynomials. For a discussion of 
complex numbers, see Section 0.6. 


The absolute value of a com- 
plex number z = x + iy is 

1*1 = \A 2 + y 2 . 
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(Recall above that the coefficient of z 2 is 1.) This was known to the Greeks 
and Babylonians. 

The cases k - 3 and k = 4 were solved in the 16th century by Cardano and 
others; their solutions are presented in Section 0.6. 

For the next two centuries, an intense search failed to find anything analogous 
for equations of higher degree. Finally, around 1830, two young mathematicians 
with tragic personal histories, the Norwegian Hans Erik Abel and the French- 
man Evariste Galois, proved that no analogous formulas exist in degrees 5 and 
higher. Again, these discoveries opened new fields in mathematics. 

Several mathematicians (Laplace, d’Alembert, Gauss) had earlier come to 
suspect that the fundamental theorem was true, and tried their hands at proving 
it. In the absence of topological tools, their proofs were necessarily short on 
rigor, and the criticism each heaped on his competitors does not reflect well on 
any of them. Although the first correct proof is usually attributed to Gauss 
(1799), we will present a modern version of d’Alembert’s argument (1746). 

Unlike the quadratic formula and Cardano’s formulas, our proof does not 
provide a recipe to find a root (Indeed, as we mentioned above, Abel and 
Galois proved that no recipes analogous to Equation 1.6.16 exist.) This is a 
serious problem: one very often needs to solve polynomials, and to this day 
there is no really satisfactory way to do it; the picture on the cover of this 
text is an attempt to solve a polynomial of degree 256. There is an enormous 
literature on the subject. 

Proof of 1.6.10. We want to show that there exists a number z such that 
p(z) — 0. The strategy of the proof is first to establish that |p(z)| has a 
minimum, and next to establish that its minimum is in fact 0. To establish 
that \p(z)\ has a minimum, we will show that there is a disk around the origin 
such that every z outside the disk gives a value \p(z)\ that is greater than |p(0)|. 
The disk we create is closed and bounded, and |p(z)| is a continuous function, 
so by Theorem 1.6.7 there is a point zo inside the disk such that |p(zo)| is the 
minimum of the function on the disk. It is also the minimum of the function 
everywhere, by the preceding argument. Finally — and this will be the main 
part of the argument — we will show that p{zq) = 0. 

We shall create our disk in a rather crude fashion; the radius of the disk we 
establish will be greater than we really need. First, \p(z)\ can be at least as 
small as |ao|, since when 2 = 0, Equation 1.6.15 gives p(0) = ao- So we want to 
show that for \z\ big enough, |p(z)| > |a 0 |- The “big enough” will be the radius 
of our disk; we will then know that the minimum inside the disk is the global 
minimum for the function. 

It it is clear that for \z\ large, | z k \ is much larger. What we have to ascertain 
is that when |z| is very big, |p(z)| > |aoj: the size of the other terms, 

\dk-iz k 1 + • ■ • + a\z + ao|, 
will not compensate enough to make |p(z)| < |oo|. 


1.6.17 
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The notation 

sup{|ofc-i|,.. • ,|aol) 

means the largest of 

|a*-i|, ... |ao|* 


The triangle inequality can also 
be stated as 

|v| - |w| < |v + w|, 

since 

|v| = |v + w - w| 

< |v + w| + |-w| 

= |v + w| + |w|. 


First, choose the largest of the coefficients |a*-i|, . . . , |ao| and call it A: 

A = sup{|a*_i|,...,|oo|}. 1-6.18 

Then if \z\ = R y and R > 1, we have 

^ + • ■ • + diz "I - flol - AR k * + • • + AR + A 

< AR k ~ l + • • • + AR k ~ l + AR k ~' = kAR k ~ l , 

1.6.19 

To get from the first to the second line of Equation 1.6.19 we multiplied all 
the terms on the right-hand side, except the first, by R l , R 2 . . . up to R k_1 in 
order to get an R k ~ 1 in all A: terms, giving kAR * -1 in all. (We don’t need to 
make this term so very big; we’re being extravagant in order to get a relatively 
simple expression for the sum. This is not a case where one has be delicate 
with inequalities.) 

Now, when \z\ = R, we have 

\p(z)\ — | z k +ak-\z k ~ l + hoo|, 1.6.20 

N s/ ^ 

R abs. value <kAR k ~ l 

so using the triangle inequality, 

\p( z )\ ^ \* k \ — H h a\z + ao| 

> R k - kAR*- 1 = R k ''(R - kA). 



Of course - kA\ > \a 0 \ when R = max{kA + |ao|, 1}- So now we 

know that any z chosen outside the disk of radius R will give \p(z)\ > |a 0 |, as 
shown in Figure 1.6.4. If the function has a minimum, that minimum has to 
be inside the disk. Moreover, we know by Theorem 1.6.7 that it does have a 
minimum inside the disk. We will denote by Zq a point inside the disk at which 
the function achieves its minimum. 
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The big question is: is zq a root of the polynomial? Is it true that jp(zo)| — 0? 
Earlier we used the fact that for |z| large, \z k \ is very large. Now we will use the 
fact that for \z\ small, \z k \ is very small. We will also take into account that we 
are dealing with complex numbers. The preceding argument works just as well 
with real numbers, but now we will need the fact that when a complex number 
is written in terms of its length r and polar angle #, taking its power has the 
following effect: 

( r(cos# + tsin B)) k = r fc (cos# + i sin#). 1.6.22 


You might object, what hap- 
pens to the middle terms, for ex- 
ample, the 2a^zou in 02(20 + u) 2 = 
U220 + 2&2 zqu + a2U 2 ? But that is 
a term in u with coefficient a 22 zo, 
so the coefficient a22zo just gets 
added to 61 , the coefficient of u. 


As you choose different values of 6 then z — r(cos# + i sin 6) travels on a circle 
of radius r. If you raise that number to the kth power, then it travels around a 
much smaller circle (for r small), going much faster — k times around for every 
one time around the original circle. 

The formulas in this last part of the proof may be hard to follow, so first we 
will outline what we are going to do. We are going to argue by contradiction, 
saying that p(zo) ^ 0, and seeing that we land on an impossibility. We will 
then see that p(zo) is not the minimum, because there exists a point z such that 
\p(z)\ < |p(zo)|. Since we have already proved that |p(zo)| is the minimum, our 
assumption that p(zo) # 0 is false. 

We start with a change of variables; it will be easier to consider numbers in 
a circle around zq if we treat zo as the origin. So set z = zq + u, and consider 
the function 


p(z) = z k + a k -\z k 1 + • • • + oo = (zq + u) k + a/t_i(zo + u) k 1 + • • • + a 0 


-u k + b k -iu k 1 + ••• + &o = q(u), 


1.6.23 


where 

^0 = z o + a k-i z o 1 + •• • + no — p{zq)- 1.6.24 


This is a polynomial of degree k in u. We have grouped together all the terms 
that don’t contain u and called them 

Now, looking at our function q(u) of Equation 1.6.23, we choose the term 
with the smallest power j > 0 that has a nonzero coefficient. (For example, if 
we had q(u) —u A - f 2u 2 + 3u + 10, that term, which we call , would be 3 u; 
if we had q(u) = u 5 -I- 2u 4 -I- 5u 3 -|- 1, that term would be 5u 3 .) We rewrite our 
function as follows 


q(u) — bo + bjU ? + (6j+itF +1 H + u k ) . 1.6.25 

N —V ' 

abe.val. smaller than \bju*\ for small u 

Exercise 1.6.2 asks you to justify that |(6j +1 u- ;+1 + • • • + u*)( < |6jU J | for small 
u. The construction is illustrated in Figure 1.6.5. 
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Because there may be lots of 

little terms 6j+iu^ +1 H Hu*, you 

might imagine that the first dog 
holds a shorter leash for a smaller 
dog who is running around him, 
that smaller dog holding a yet 
shorter leash for a yet smaller dog 
who is running around him .... 



Exercise 1 .6.6 asks you to prove 
that every polynomial over the 
complex numbers can be factored 
into linear factors, and that every 
polynomial over the read numbers 
can be factored into linear factors 
and quadratic factors with com- 
plex roots. 

Recall that p is pronounced 
“rho.” 


FIGURE 1.6.5. The point p(zo) = bo (the flagpole) is the closest that p(z) ever comes 
to the origin, for all z. The assumption that the flagpole is different from the origin 
(bo 0) leads to a contradiction: if |u| is small, then as z = zo + u takes a walk 
around zo (shown at left), p(z) (the dog) goes around the flagpole and will at some 
point be closer to the origin than is the flagpole itself (shown at right). 


Now consider our number u written in terms of length p and polar angle 9: 

u = p(cos0 + isin0). 1.6.26 

The numbers z = z 0 + u then turn in a circle of radius p around zq as we 
change the angle 9. What about the numbers p(z)7 If we were to forget about 
the small terms grouped in parentheses on the right-hand side of Equation 
1.6.25, we would say that these points travel in a circle of radius fP (smaller 
than p for p < 1) around the point ho = p(z o)* We would then see that, as 
shown in Figure 1.6.5, that if fP < |6o|, some of these points are between bo and 
0; i.e., they are smaller than 6 q. If we ignore the small terms, this would mean 
that there exists a number z such that \p(z)\ < \p(zo)\, contradicting the fact, 
which we have proved that \p(zq)\ is the minimum of the function. 

Of course we can’t quite ignore the small terms, but we can show that they 
don’t affect our conclusion. Think of ho as a flagpole and ho + fyuf, with |u| - p 
as a man walking on a circle of radius \bj\fP around that flagpole. He is walking 
a dog that is running circles around him, restrained by a leash of radius less than 
\bj Ip 7 , for p sufficiently small. The leash represents the small terms. So when 
the man is between 0 and the flagpole, the dog, which represents the point p(z)), 
is closer to 0 than is the flagpole. That is, \p(z)\ is less than |ho| = |p(*o)|. This 
is impossible, because we proved that \p(zo)\ is the minimum of our function. 
Therefore, our assumption that p(zo) ^ 0 is false. □ 


The proof of the fundamental theorem of calculus illustrates the kind of thing 
we meant when we said, in the beginning of Section 1.4, that calculus is about 
“some terms being dominant or negligible compared to other terms.” 
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1.7 Differential Calculus: Replacing nonlinear 
Transformations by Linear Transformations 

Born: I should like to put to Herr Einstein a question, namely, how 
quickly the action of gravitation is propagated in your theory .... 

Einstein: It is extremely simple to write down the equations for the case 
when the perturbations that one introduces in the field are infinitely small. 

. . . The perturbations then propagate with the same velocity as light. 

Born: But for great perturbations things are surely very complicated? 

Einstein: Yes , it is a mathematically complicated problem. It is especially 
difficult to find exact solutions of the equations, as the equations are 
nonlinear. — Discussion after lecture by Einstein in 1913 


The object of differential cal- 
culus is to study nonlinear map- 
pings by replacing them with linear 
transformations ; we replace non- 
linear equations with linear equa- 
tions, curved surfaces by their tan- 
gent planes, and so on. 


As mentioned in Section 1.3, in real life (and in pure mathematics as well) a 
great many problems of interest are not linear; one must consider the effects of 
feedback. A pendulum is an obvious example: if you push it so that it moves 
away from you, eventually it will swing back. Second-order effects in other 
problems may be less obvious. If one company cuts costs by firing workers, it 
will probably increase profits; if all its competitors do the same, no one company 
will gain a competitive advantage; if enough workers lose jobs, who will buy 
the company’s products? Modeling the economy is notoriously difficult, but 
second-order effects also complicate behavior of mechanical systems. 

The object of differential calculus is to study nonlinear mappings by replacing 
them with linear transformations. Of course, this linearization is useful only if 
you understand linear objects reasonably well. Also, this replacement is only 
more or less justified. Locally, near the point of tangency, a curved surface may 
be very similar to its tangent plane, but further away it isn’t. The hardest part 
of differential calculus is determining when replacing a nonlinear object by a 
linear one is justified. 

In Section 1.3 we studied linear transformations in R n . Now we will see 
what this study contributes to the study of nonlinear transformations, more 
commonly called mappings. 

This isn’t actually a reasonable description: nonlinear is much too broad a 
class to consider. Dividing mappings into linear and nonlinear is like dividing 
people into left-handed cello players and everyone else. We will study a limited 
subset of nonlinear mappings: those that are, in a sense we will study with care, 
“well approximated by linear transformations.” 


Derivatives and linear approximation in one dimension 

In one dimension, the derivative is the main tool used to linearize a function. 
Recall from one variable calculus the definition of the derivative: 
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The derivative of a function / : M — ► R, evaluated at a, is 


Limiting the domain as we do 
in Definition 1.7.1 is necessary, be- 
cause many interesting functions 
are not defined on all of R, but 
they are defined on an appropriate 
open subset U C R. Such func- 
tions as log x, tan x and 1/x are 
not defined on all of R; for exam- 
ple, 1/x is not defined at 0. So if 
we used Equation 1.7.1 as our def- 
inition, tan x or logx or 1/x would 
not be differentiable. 


/'(a) = lim £(/(a + /i) - /(a)). 1.7.1 

h — *0 tl 

Although it sounds less friendly, we really should say: 

Definition 1.7.1 (Derivative). Let U be an open subset of R, and / : 
[/->Ra function. Then / is differentiable at a € U if the limit 

/'(a) = Hm - (/(a + h) - f(a)) exists. 1.7.2 

Students often find talk about open sets U € R and domains of definition 
pointless; what does it mean when we talk about a function f : U —* R? This 
is the same as saying / : R — > R, except that /(x) is only defined if x is in U. 

Example 1.7.2 (Derivative of a function from R — > R). If f(x) = x 2 , 
then /'(x) = 2x. This is proved by writing 


f'(x) = lim I((x + h ) 2 - X 2 ) 


lim t-(2x/i + h, 2 ) = 2x + lim h — 2x. 1.7.3 

h — o h /i— o 


We discussed in Remark 1.5.1 
why it is necessary to specify an 
open set when talking about de- 
rivatives. 

Exercises 1.7.1, 1.7.2, 1.7.3, 
and 1.7.4 provide some review of 
tangents and derivatives. 


The derivative 2x of the function f(x) = x 2 is the slope of the line tangent 
to / at x; one also says that 2x is the slope of the graph of f at x. In higher 
dimensions, this idea of the slope of the tangent to a function still holds, al- 
though already in two dimensions, picturing a plane tangent to a surface is 
considerably more difficult than picturing a line tangent to a curve. A 

Partial derivatives 


One kind of derivative of a function of several variables works just like a de- 
rivative of a function of one variable: take the derivative with respect to one 
variable , treating all the others as constants. 


Definition 1.7.3 (Partial derivative). Let U be an open subset of R n 
and f : U — ► R a function. The partial derivative of / with respect to the 
ith variable, and evaluated at a, is the limit 



/ 

/ a l \ 


(CL 1 \ 


D ' m = & s 

/ 

Qi h 






On / 


uj 

/ 


if the limit exists, of course. 
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The partial derivative Di f 
measures change in the direction 
of the vector e: ; the partial deriv- 
ative Dif measures change in the 
direction of the vector efe; and so 
on. 


We can rewrite Equation 1.7.4, using standard basis vectors: 

WhSliMS, 

h — *0 ft 

since all the entries of e» are 0 except for the ith entry, which is 1, so that 

/ \ 


1.7.5 


a + h€i = 


di + h{ 


\ a n ) 


1.7.6 


Different notations for the par- 
tied derivative exist: 




A notation often used in partial 
differential equations is 


/«. = DJ. 


The partial derivative Dif( a) answers the question, how fast does the func- 
tion change when you vary the ith variable, keeping the other variables con- 
stant? It is computed exactly the same way as derivatives are computed in first 
year calculus. To take the partial derivative with respect to the first variable 

of the function / ^ ^ ^ = xy , one considers y to be a constant and computes 

Dif = y. 

What is Dif if / ^ ^ = x 3 + x 2 y + y 2 ? What is £> 2 /? Check your answers 
below. 1S 


Remark. There are at least four commonly used notations for partial deriva- 
tives, the most common being 

df_ df_ df_ 

dx\ ’ 8x2 ’ ’ dxi 

for the partial derivative with respect to the first, second, . . . ,ith variable. We 
prefer the notation Dif , because it focuses on the important information: with 
respect to which variable the partial derivative is being taken. (In problems 
in economics, for example, where there may be no logical order to the vari- 
ables, one might assign letters rather than numbers: D w f for the “wages” 
variable, D v f for the “prime rate,” etc.) It is also simpler to write and looks 
better in matrices. But we will occasionally use the other notation in examples 
and exercises, so that you will be familiar with it. A 


Pitfalls of partial derivatives 

One eminent French mathematician, Adrien Douady, complains that the no- 
tation for the partial derivative omits the most important information: which 
variables are being kept constant. 


1H Dif = 3x 2 + 2 xy and D 2 / = x 2 + 2 y. 
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For instance, consider the very real question: does increasing the minimum 
wage increase or decrease the number of minimum wage jobs? This is a question 
about the sign of 


All notations for the partial 
derivative omit crucial informa- 
tion: which variables are being 
kept constant. In modeling real 
phenomena, it can be difficult even 
to know what all the variables are. 
But if you don’t, your partial de- 
rivatives may be meaningless. 


Note that the partial derivative 
of a vector-valued function is a 
vector. 

We use the standard expres- 
sion, “vector-valued function,” 
but note that the values of such 
a function could be points rather 
than vectors; the difference in 
Equation 1.7.8 would still be a vec- 
tor. 


We give two versions of Equa- 
tion 1.7.10 to illustrate the two no- 
tations and to emphasize the fact 
that although we used x and y to 
define the function, we can evalu- 
ate it at variables that look differ- 
ent. 


D minimum wage /i 

where x is the economy and f(x) = number of minimum wage jobs. 

But this partial derivative is meaningless until you state what is being held 
constant, and it isn’t at all easy to see what this means. Is public investment to 
be held constant, or the discount rate, or is the discount rate to be adjusted to 
keep total unemployment constant, as appears to be the present policy? There 
are many other variables to consider, who knows how many. You can see here 
why economists disagree about the sign of this partial derivative: it is hard if 
not impossible to say what the partial derivative is, never mind evaluating it. 

Similarly, if you are studying pressure of a gas as a function of temperature, 
it makes a big difference whether the volume of gas is kept constant or whether 
the gas is allowed to expand, for instance because it fills a balloon. 


Partial derivatives of vector- valued functions 

The definition of a partial derivative makes just as good sense for a vector- 
valued function (a function from IR n to R m ). In such a case, we evaluate the 
limit for each component of f, defining 


Af(a) = lim i 
h—0 h 


/ 

f 01 1 
• 


f a i \ 

• 

* 

• 

> 


A/, (a) 

f 

fli 4- h 

-f 

Oi 

• 


= 

• 

• 


K a n / 


• 

• 

\On / 

) 


•A/m( a).. 


1.7.8 


Example 1.7.4. Let f : R 2 -+ IR 3 be given by 

/,(*)=*» 


h ^ = sill (x + y), written more simply f ^ j = [ sin (x + y) 


x 2 -y 2 


The partial derivative of f with respect to the first variable is 


1.7.9 


— ' / r\ 

y 

Dlf u) = 

cos (x + y) 

\y / 

2x 


or 


£M = 

dx\ \b ) 


cos (a 4- b) 
2a 


1.7.10 
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What is the partial derivative with respect to the second variable? 19 What 
are the partial derivatives at of the function 



How would you rewrite the answer, using the notation of Equation 1.7.7? 20 


Directional derivatives 


The partial derivative 


lim 

/i— o 


f(a + hei) - f(a) 
h 


1.7.11 


Partial derivatives measure the 
rate at which f varies as the vari- 
able moves in the direction of the 
standard basis vectors. Direction- 
al derivatives measure the rate at 
which f varies when the variable 
moves in any direction. 


measures the rate at which f varies as the variable x moves from a in the 
direction of the standard basis vector e*. It is natural to want to know how f 
varies when the variable moves in any direction v: 


Definition 1.7.5 (Directional derivative). The directional derivative of 
f at a in the direction v, 


lim 
h — »0 


f(a + hv) - f(a) 


1.7.12 


measures the rate at which f varies when x moves from a in the direction V. 


Some authors consider that 
only vectors v of length 1 can be 
used in the definition of directional 
derivatives. We feel this is am un- 
desirable restriction, as it loses the 
essential linear character of the di- 
rectional derivative as a function 
of v. 


Example 1.7.6 (Computing a directional derivative). Let us compute 


the derivative in the direction v = 


1 

2 

1 


x 

of the function f \ y | = xysinz, 

z 


/ 1 \ 

V 


' h' 

evaluated at the point a — I 1 1 . We have hv = h 

2 


2h 

W2/ 

1 


h 


Equation 1.7.12 becomes 


f(a 4-/*v) 


, \ 

lim - (1 4- h)( 1 + 2h) sin(^ + h) — (1 • 1 • sin -) 
h — '0 h, 2 2 



1.7.13 
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Using the formula sin (a + b) = sin a cos 6 4- cos a sin 6, this becomes 

= 1 =o 

Hin ^(((1 +h)( 1 + 2/i)(sin ^ cos h + cos ^ sin h)) - sin -) 

= lim i (((1 + 3/i + 2/i 2 )(cos/i)) - l) 

= lim i(cos/i- 1) + lim ^-3hcosh+ lim ~2/i 2 cosh = 0 + 3 + 0 = 
■ ~ ** v ' h — *o h *- — n s 


/i-»o h 


h -> 0 /l 


1.7.14 
3. A 


The derivative in several variables 

Often we will want to see how a system changes when all the variables are 
allowed to vary; we want to compute the whole derivative of the function. We 
will see that this derivative consists of a matrix, called the Jacobian matrix , 
whose entries are the partial derivatives of the function. We will also see that 
if a function is differentiable, we cam extrapolate all its directional derivatives 
from the Jacobian matrix. 

Definition 1.7.1 from first year calculus defines the derivative as the limit 

change in / . _ f(a + h) - /(a) , * , B 

. , i.e., , , I./. lo 

change m x h 

as h (the increment to the variable x) approaches 0. This does not generalize 
well to higher dimensions. When / is a function of several variables, then an 
increment to the variable will be a vector, and we can’t divide by vectors. 

It is tempting just to divide by |/i|, the length of h: 

/'(a) = lim -4- (/(a + ii) - /(a)). 1.7.16 

h— »o jhj 

This would allow us to rewrite Definition 1.7.1 in higher dimensions, since we 
can divide by the length of a vector, which is a number. But this wouldn’t 
work even in dimension 1, because the limit changes sign when h approaches 
0 from the left and from the right. In higher dimensions it’s much worse. All 
the different directions from which h could approach 0 give different limits. By 
dividing by jh| in Equation 1.7.16 we are canceling the magnitude but not the 
direction. 

We will rewrite it in a form that does generalize well. This definition will 
emphasize the idea that a function / is differentiable at a point a if the increment 
A / to the function is well approximated by a linear function of the increment 
h to the variable. This linear function is f'(a)h. 
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When we call f'{a)h a linear 
function, we mean the linear func- 
tion that takes the variable h and 
multiplies it by /'(a) — i.e., the 
function h /'(a) h (to be read, 
u h maps to /'(a)/!”). Usually the 
derivative of / at a is not a linear 
function of a. If f(x) = sinx or 
f(x) — x 3 , or just about anything 
except f(x) — x 2 , then /'(a) is not 
a linear function of a. But h *-* 
/'(a) h is a linear function of h. 
For example, h (sin x)h is a lin- 
ear function of h, since (sin x)(hi -f 
hi) = (sinx)/ii 4- (sinx)/i 2 - 

Note the difference between 
(“maps to”) and — ♦ (“to”). The 
first has a “pusher.” 



Figure 1.7.1. 
The mapping 


'(;) -(■>) 
takes the shaded square in the 
square at top to the shaded area 
at bottom. 


Definition 1.7.7 (Alternate definition of the derivative). A function 
/ is differentiable at a, with derivative m, if and only if 

linear function 
A/ of Ar 

Hm i ^ (7(0 + h) - /(a)) - (mh) ^ = 0. 1.7.17 

The letter A, named “delta,” denotes “change in”; A / is the change in 
the function; Ax — h is the change in the variable x. The function mh that 
multiplies h by the derivative m is thus a linear function of the change in x. 

We are taking the limit as h — ► 0, so h is small, and dividing by it makes 
things big; the numerator — the difference between the increment to the function 
and the approximation of that increment — must be very small when h is near 
0 for the limit to be zero anyway (see Exercise 1.7.11). 

The following computation shows that Definition 1.7.7 is just a way of re- 
stating Definition 1.7.1: 

/'(a) by Equation 1.7.2 

lim l ((/(a + h) - /(a)) - [/'(a)]ft) = Urn 

= /'(a) — /'(a) = 0. 1.7.18 

Moreover, the linear function h *-+ f'(a)h is the only linear function satisfying 
Equation 1.7.17. Indeed, any linear function of one variable can be written 
h h-» mh , and 

0 =& X (( /(a + " '<“» - mh ) = & ~ - x = 

1.7.19 

so /'(a) = m. 


The derivative in several variables: the Jacobian matrix 

The point of rewriting the definition of the derivative is that with Definition 
1.7.7, we can divide by |/i| rather than h\ m = /'(a) is also the unique number 
such that 

A/ linear function of h 

TF{^) )=0. 1.7.20 

It doesn’t matter if the limit changes sign, since the limit is 0; a number close 
to 0 is close to 0 whether it is positive or negative. 

Therefore we can generalize Equation 1.7.20 to mappings in higher dimen- 
sions, like the one in Figure 1.7.1. As in the case of functions of one variable, the 
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key to understanding derivatives of functions of several variables is to think that 
the increment to the function (the output) is approximately a linear function 
of the increment to the variable (the input): i.e., that the increment 

Af = f(a + h) - f(a) 1.7.21 

is approximately a linear function of the increment h. 

In the one-dimensional case, A/ is well approximated by the linear function 
h f'(a)h. We saw in Section 1.3 that every linear transformation is given by 
a matrix; the linear transformation h *—» f'(a)h is given by multiplication by 
the lxl matrix [/'(a)]. 

For a mapping from R n — > M m , the role of this lxl matrix is played by a 
m x n matrix composed of the partial derivatives of the mapping at a. This 
matrix is called the Jacobian matrix of the mapping f; we denote it [Jf (a)]: 


Definition 1.7.8 (Jacobian matrix). The Jacobian matrix of a function 
f is the m x n matrix composed of the partial derivatives of f evaluated at 

a: 


[Jf (a)] = 


Di/i(a) ... D n fi(a.) 

• • 

• • 

• • 

„-^l/m(*0 ••• ■I-^n/m(®)_ 


1.7.22 


Example 1.7.9. The Jacobian matrix of the function in Example 1.7.4 is 


Note that in the Jacobian ma- 
trix we write the components of f 
from top to bottom, and the vari- 
ables from left to right. The first 
column gives the partial derivar 
tives with respect to the first vari- 
able; the second column gives the 
partial derivatives with respect to 
the second variable, and so on. 



y x 

cos (x + y) cos (x + y) 
2x —2y 


1.7.23 


The first column of the Jacobian matrix gives D\f, the partial derivative with 

respect to the first variable, x\ the second column gives D 2 f, the partial deriv- 
ative with respect to the second variable, y. A 


What is the Jacobian matrix of the function 



your answer below. 21 


x 3 y \ 

2x 2 y 2 ? Check 
xy ) 


l j, (:)] - 


r " 2 „ _3 

y x 
2 . 2 . 


. The first column is D\f (the partial derivatives 


4xy 2 4.x 2 y 
L V x * 

with respect to the first variable); the second is jSf. The first row gives the partial 
derivatives for / ^ J = x 3 y; the second row gives the partial derivatives for / ^ = 

2x y , and the third gives the partial derivatives for / ^ x ^ = xy. 
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When is the Jacobian matrix of a function its derivative? 


The fact that the derivative of 
a function in several variables is 
represented by the Jacobian ma- 
trix is the reason why linear alge- 
bra is a prerequisite to multivari- 
able calculus. 


We will examine the issue of 
such pathological functions in Sec- 
tion 1.9. 


We would like to say that if the Jacobian matrix exists, then it is the derivative 
of f. That is, we would like to say that the increment f(a + h) — f(a) is 
approximately [Jf(a))h, in the sense that 

lim i- ( (f(a + h) - f(a)) - [J/(a)]h) = 0. 1.7.24 

This is the higher-dimensional analog of Equation 1.7.17, which we proved 
in one dimension. Usually it is true in higher dimensions: you can calculate 
the derivative of a function with several variables by computing its partial 
derivatives, using techniques you already know, and putting them in a matrix. 

Unfortunately, it isn’t always true: it is possible for all partial derivatives of 
a function f to exist, and yet for f not to be differentiable! The best we can do 
without extra hypotheses is the following statement. 


Theorem 1.7.10 (The Jacobian matrix and the derivative). If there 
is any linear transformation L such that 

fi-o iSi ( (f( “ + fi) " f(a) ) “ = °> i-7-25 

then all partial derivatives off at a exist, and the matrix representing L is 
[Jf(a)]. In particular, such a linear transformation is unique . 


Definition 1.7.11 (Derivative). If the linear transformation of Theorem 
1.7.10 exists, f is differentiable at a, and the linear transformation represented 
by [Jf(a)J is its derivative [Df(a)J: the derivative of f at a. 


Remark. It is essential to remember that the derivative [Df(a)| is a matrix 
(in the case of a function / : R — * R, a 1 x 1 matrix, i.e., a number). It is 
convenient to write (Df(a)] rather than writing the Jacobian matrix in foil: 


(Df(a)I = [Jf(a)l = 


£i/i(a) ••• Ai/i(a) 


L^i/m(a) ••• Z? n / m (a)J 


1.7.26 


But when you see (Df(a)], you should always be aware of its dimensions. Given 
a function f : R 3 IR 2 , what are the dimensions of its derivative at a, [Df(a)J? 
Check your answer below. 22 A 


22 Since f : R 3 R 2 takes a point in R 3 and gives a point in R 2 , similarly, [Df(a)j 
takes a vector in R 3 and gives a vector in R 2 . Therefore (Df(a)] is a 2 x 3 matrix. 
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We will prove Theorem 1.7.10 after some further discussion of directional 
derivatives, and a couple of extended examples of the Jacobian matrix as de- 
rivative. 


Extrapolating directional derivatives from partial derivatives 

If a function is differentiable, we can extrapolate all its directional derivatives 
from its partial derivatives — i.e., from its derivative: 


In Equation 1.7.27 for the di- 
rectional derivative, we use the in- 
crement vector hv rather than h 
because we are measuring the de- 
rivative only in the direction of a 
particular vector v. 


Example: You are standing at 
the origin on a hill with height 


(:)-* 


3x + 8 y. 

When you step in direction v = 


your rate of ascent is 


[;]• 

[D/(g)]v = |3, 8) 


= 19. 


Proposition 1.7.12 (Computing directional derivatives from the de- 
rivative). If f is differentiable at a, then all directional derivatives of f at 
a exist ; the directional derivative in the direction v is given by the formula 

Lim f (a + hi p ~ f (a) - = (Df(a)]v. 1.7.27 

h-*0 h 


Example 1.7.13 (Computing a directional derivative from the Jaco- 
bian matrix). Let us use Proposition 1.7.12 to compute the directional de- 

( x \ 

rivative of Example 1.7.6. The partial derivatives of / I y I = xysinz are 

D\f = 2 / sin 2,02/ = xsin 2 and D$f = xycosz , so its derivative evaluated 

( 1 \ 

at the point I 1 I is the one- row matrix [1, 1, 0]. (The commas may be 

W2/ 

misleading but omitting them might lead to confusion with multiplication.) 

rn 


Multiplying this by the vector v = 
is what we got before. A 


2 

1 


does indeed give the answer 3, which 


Proof of Proposition 1.7.12. The expression 

r (h) = (f (a + h) - f (a)) - [Df (a)]h 1.7.28 

defines the “remainder” r(h) — the difference between the increment to the 
function and its linear approximation — as a function of the increment h. The 
hypothesis that f is differentiable at a says that 

.. r(h) ' 

lim -V = 0. 1.7.29 

h-o |h| 

Substituting hv for h in Equation 1.7.28, we find 

r (hv) = f (a + hv) - f (a) - [Df (a)]/iv, 


1.7.30 
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and dividing by h gives 

r(*v) f(a + ftv)-f(a) _ 

11 h\y\ h 1 WJ 


1.7.31 


To follow Equation 1.7.32, re- 
call that for any linear transforma- 
tion T, we have 

T(av) = aT(v). 

The derivative [Df(a)j gives a lin- 
ear transformation, so 

[Df(a)]hV = /i(Df(a)]v. 


where we have used the linearity of the derivative to write 

[Df(a)lftv = ft(Df(a)|v „ [Df(a)]V . 

h h 


The term 

r(/iv) 

/i|v| 


1.7.32 


1.7.33 


on the left side of Equation 1.7.31 has limit 0 as h — * 0 by Equation 1.7.29, so 


Once we know the partial deri- 
vatives of f , which measure rate of 
change in the direction of the stan- 
dard basis vectors, we can com- 
pute the derivatives in any direc- 
tion. 

This should not come as a sur- 
prise. As we saw in Example 
1.3.15, the matrix for any linear 
transformation is formed by see- 
ing what the transformation does 
to the standard basis vectors: The 
ith column of the matrix [T] is 
One can then see what T 
does to any vector V by multiply- 
ing (T)V. The Jacobian matrix is 
the matrix for the “rate of change” 
transformation, formed by seeing 
what that transformation does to 
the standard basis vectors. 


l, m f(» + W)-f(a) 
h — o h 


[Df(a)]v = 0. □ 


1.7.34 


Example 1.7.14 (The Jacobian matrix of a function f : K 2 — ► R 2 ). Let’s 
see, for a fairly simple nonlinear mapping from R 2 to R 2 , that the Jacobian 
matrix does indeed provide the desired approximation of the change in the 
mapping. The Jacobian matrix of the mapping 


f (vM*’-vO * [ Jf (»)] = [ 2 x j 


X 

22 / 


1.7.35 


since the partial derivative of xy with regard to x is y, the partial derivative of 

xy with regard to y is x, and so on. Our increment vector will be ^ 

Plugging this vector, and the Jacobian matrix of Equation 1.7.35, into Equar 
tion 1.7.24, we get 


[ 


lm 


1 


v/F+F 





f(«+h) 


f(a) 


b a 
2 a -26 

Jacobian matrix £ 


1.7.36 


The Vh 2 4- k 2 at left is |h|, the length of the increment vector (as defined in 


Equation 1.4.7). Evaluating f at ^ + and at 

im 1 ( (( (« + *)(* + *) W a6 YV _ 

|Y fo 7PTP\lv (a + h) 2 - (i> + *) 2 ) {a 2 -ti 2 )) 


we have 


bh + ak 
2ah — 2bk 



1.7.37 



1.7 Differential Calculus 111 


After some computations the left-hand side becomes 


For example, at ^ ® ^ ^ j ^ , 


the function 


of Equation 1.7.35 gives / = 

^ ^ , and we are asking whether 

'CM! -5113 

= {l + h + k\ 

\ 2h- 2k ) 

is a good approximation to 

'0:2) 

/ 1 + /i + fc + hk \ 

“ \2h-2k + h 2 -k 2 ) * 

(That is, we are asking whether 
the difference is smaller than lin- 
ear. In this case, it clearly is: the 
first entry differs by the quadratic 
term hk, the second entry by the 
quadratic term /i 2 — k 2 .) 


1 ab 4- ak + bh + hk - ab - bh — ak 

a 2 + 2ah + h? - b 2 - 2bk - ]£_ - a 2 + 6 2 - 2 ah -t- 2bk 


1.7.38 


which looks forbidding. But all the terms of the vector cancel out except those 
that are underlined, giving us 


cl Vh 2 + k 5 [h 2 -k 2 


hk 1 ? 10 


1.7.39 


Indeed, the hypotenuse of a triangle is longer than either of its other sides, 
0 < |h| < y/h 2 + k 2 and 0 < |A;| < y/h? + k?> so 


- braih s ^ 


and we have 


squeezed between 
Oand 0 


0 < lim — — - - < lim \k\ 

MM V^TFI" r 0 i 

l*J L°J UJ to. 


Similarly, 


0 - | s/i? +f| " |/l1 1 VpWI + W | v^TF I - |ft| + |fc| ’ 


1.7.40 


1.7.41 


1.7.42 


squeezed between 
O&ndO 


0 < lim 


>]Jo 

~l° 


ww\<-uh 0 r +w) 

k 0 


= 0 + 0 = 0 . 


1.7.43 
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When we speak of Mat(n, n) 
as the space of n x n matrices 
we mean that we can identify an 
n x n matrix with an element of 
DT* 2 . In Section 2.6 we will see 
that Mat(n,n) is an example of 
an abstract vector space, and we 
will be more precise about what 
it means to identify such a space 
with an appropriate 


If you wonder how we found the 
result of Equation 1.7.45, look at 
the comment accompanying Equa- 
tion 1.7.48. 


We could express the deriva- 
tive of the function f(x) = x 2 as 
/'(x) : h 2xh. 


Equation 1.7.48 shows that 
AH + HA 

is exactly the linear terms in H of 
the increment to the function, so 
that subtracting them leaves only 
higher degree terms; i.e., AH+HA 
is the derivative. 


Example 1.7.15 (The derivative of a matrix squared). In most serious 
calculus texts, the first example of a derivative is that if f(x) = x 2 f then 
f'(x) = 2x, as shown in Equation 1.7.3. Let us compute the same thing when 
a matrix, not a number, is being squared. This could be written as a function 

you Qje asked to spell this out for n = 2 and n = 3 in Exercise 
1.7.15. But the expression that you get is very unwieldy as soon as n > 2, as 
you will see if you try to solve the exercise. This is one time when a linear 
transformation is easier to deal with than the corresponding matrix. It is much 
easier to denote by Mat (n, n ) the space ofnxn matrices, and to consider the 
mapping 5 : Mat (n, n) — ► Mat (n, n) given by 

S(A) = A 2 . 1.7.44 

(The S stands for “square.”) 

In this case we will be able to compute the derivative without computing the 
Jacobian matrix. We shall see that S is differentiable and that its derivative 
[DS(j 4)J is the linear transformation that maps H to AH + HA: 

[D$(yl)]ff ~ AH + HA, also written [DS(4)] : H ~ AH + HA. 1.7.45 

Since the increment is a matrix, we denote it H. Note that if matrix multipli- 
cation were commutative, we could denote this derivative 2AH or 2 HA — very 
much like the derivative /' = 2x for the function f(x) = x 2 . 

To make sense of Equation 1.7.45, a first thing to realize is that the map 

(D5(>1)] : Mat(n, n) — ► Mat(n, n), H*-* AH + HA 1.7.46 

is a linear transformation. Exercise 2.6.4 asks you to check this, along with 
some extensions. 

Now, how do we prove Equation 1.7.45? 

Well, the assertion is that 

lim -It: | (S(A + H) - 5(4)) - (AH + HA) | = 0. 1.7.47 

//— »0 j tl I increment Linear function of 

to mapping increment to variable 

Since S(,4) = .A 2 , 

\S(A + H)~ S(A) - (AH + HA ) | = \(A + H) 2 - A 2 - AH - HA\ 

= \a 2 + ah + ha+h 2 -a 2 -ah-ha\ 

= \H 2 \. 1.7.48 

So the object is to show that 
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Since |if 2 | < \H\ 2 (by Proposition 1.4.11), this is true. A 

Exercise 1.7.18 asks you to prove that the derivative AH + H A is the “same” 
as the Jacobian matrix computed with partial derivatives, for 2 x 2 matrices. 
Much of the difficulty is in understanding S as a mapping from R 4 -» R 4 . 

Here is another example of the same kind of thing. Recall that if f(x) = 1/x, 
then f'(a) = -1/a 2 . Proposition 1.7.16 generalizes this to matrices. 


Exercise 1.7.20 asks you to 
compute the Jacobian matrix and 
verify Proposition 1.7.16 in the 
case of 2 x 2 matrices. It should 
be clear from the exercise that us- 
ing this approach even for 3 x 3 
matrices would be extremely un- 
pleasant. 


Proposition 1.7.16. The set of invertible matrices is open in Mat(n,n), 
and if f(A) = A~ l , then f is differentiable , and 

\Df(A)]H = -A~ 1 HA~ l . 1.7.50 


Note the interesting way in which this reduces to f(a)h — — h/a 2 in one 
dimension. 


Proof. (Optional) We proved that the set of invertible matrices is open in 
Corollary 1.5.33. Now we need to show that 


lim -^-r 
H-.0 \H\ 


( (A + H)' 1 - A~ l 
V' v ' 

increment to mapping 


- -A-'HA~\ ) 

linear function of H 


= 0 . 


1.7.51 


Our strategy (as in the proof of Corollary 1.5.33) will be to use Proposition 
1.5.31, which says that if B is a square matrix such that \B\ < 1, then the 
series / + B + B 2 + • ■ • converges to (/ - B)~ l . (We restated the proposition 
here changing the .4’s to B ' s to avoid confusion with our current A’s.) We also 
use Proposition 1.2.15 concerning the inverse of a product of invertible matrices. 
This gives the following computation: 


Since H — ♦ 0 in Equation 1.7.51, 
we can assume that \A~ X H\ < 1, 
so treating (/ + A~ l H)~ l as the 
sum of the series is justified. 


Prop. 1.2.15 

(A + H)-' = (A(I + A-'H))~' = (I + A~ l H)~'A~' 

= (/ -(-A-'H))-' A-' 

N v ' / 

sum of series in line below 

= (/ + {-A-'H) + (- A-'H ) 2 + • • ■ ) A -' 1 

^ ' 

series /+B+£ 2 +..., where B=—A~ l H 


(Now we consider the first term, second terms, 
and remaining terms:) 


1.7.52 


= A~ l - A-'HA-' +jj-A-'Hf + (-A-'Hf + . . . )yT> 

1st 2nd others 

It may not be immediately obvious why we did the computations above. The 
point is that subtracting (X -1 - A~ l HA~ l ) from both sides of Equation 1.7.52 
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Switching from matrices to 
lengths of matrices in Equation 
1.7.54 has several important con- 
sequences. First, it allows us, via 
Proposition 1.4.11, to establish an 
inequality. Next, it explains why 
we could multiply the |/1 -, | 2 and 
l^" 1 ! to get |i4 -l | 3 : matrix mul- 
tiplication isn’t commutative, but 
multiplication of matrix lengths is, 
since the length of a matrix is a 
number. Finally, it explains why 
the sum of the series is a fraction 
rather than a matrix inverse. 


gives us, on the left, the quantity that really interests us: the difference between 
the increment of the function and its approximation by the linear function of 
the increment to the variable: 

((A + H)~ l -A~ l ) - (-A~ l HA~ l ) 


increment to function 


linear function of 
increment to variable 


= ((- y 4 -1 //) 2 + H )^ 4 _1 1 - 7.53 

= (/T 'ff) 2 (l + (~A~'H) + ( -A~'H ) 2 + . . . )A- 1 . 
Now applying Proposition 1.4.11 to the right-hand side gives us 
\(A + H)- l -A-' +A-'HA-'\ 

< \A- l H \ 2 |l + (-A-'H ) + (-A-'H ) 2 + . . . 

and the triangle inequality gives 


< M-'tffM-'Kl + I - A-'H\ + I - A-'H I 2 + • • •) 


convergent geometric series 
— 


1.7.54 


< 


\h\ 2 \a-'\* 


1 

1 


Recall (Exercise 1.5.3) that the 
triangle inequality applies to con- 
vergent infinite sums. 


Now suppose H so small that \A~ l H\ < 1/2, so that 


We see that 


1 

l-\A~'H\ 


< 2 . 


1.7.55 


H-0 |^i ^ + H) ' 1 ~ A ~' + A ~‘ ffA '‘l 


< hm2|ffp >| 3 = 0. 

□ 1.7.56 


Proving Theorem 1,7.10 about the Jacobian matrix 

Now it’s time that we proved Theorem 1.7.10. We restate it here: 


Theorem 1.7.10 (The Jacobian matrix as derivative), if there is any 

linear transformation L such that 


,. m (f(a + h) - f(a)) - L(h) 
fi-0 |h| 



(1.7.25) 


then all 
(»(•)]■ 


partial derivatives of f at a exist, and the matrix representing L is 
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Proof. We know (Theorem 1.3.14) that the linear transformation L is repre- 
sented by the matrix whose tth column is Le t , so we need to show that 

Le, = AT 17 - 57 


This proof proves that if there 
is a derivative, it is unique, since a 
linear transformation has just one 
matrix. 


Dii being by definition the ith column oftfie Jacobian matrix (Jf (a)]. 

Equation 1.7.25 is true for any vector h, including te { , where t is a number: 


lim (f(a-Hej) - /(a)) - £(*eQ _ Q 
im \te t \ 


1.7.58 


( e , — >0 


We want to get rid of the absolute value signs in the denominator. Since 
|£ej| = |t||ej| (remember t is a number) and |e x | = 1 , we can replace | by |t|. 
The limit in Equation 1.7.58 is 0 for t small, whether t. is positive or negative, 
so we can replace |t| by t : 


lim /(a) -£(<%) _ Q 

t«i— *0 t 

Using the linearity of the derivative, we see that 

L(tei) = tLfei), 


1.7.59 


1.7.60 


so we can rewrite Equation 1.7.59 as 

lim f(a + te t ) - f(a) 
e©,— o t 


- L(Si) = 0. 


1.7.61 


The first term is precisely Definition 1.7.5 of the partial derivative. So L(ei) = 

£>if(a): the ith column of the matrix corresponding to the linear transformation 

L is indeed . In other words, the matrix corresponding to L is the Jacobian 
matrix. □ 


1.8 Rules for computing derivatives 


In this section we state rules for computing derivatives. Some are grouped in 
Theorem 1.8.1 below; the chain rale is discussed separately, in Theorem 1.8.2. 
These rules allow you to differentiate any function that is given by a formula. 


Theorem 1.8.1 (Rules for computing derivatives). Let U C R n be an 
open set . 

(1) If t:U -+ R m is a constant function , then f is differentiable, and its de- 
rivative is [0] (i.e., it is the zero linear transformation R n — ♦ R m , represented 
by the mxn matrix filled with zeroes) 

(2) If t : R n — ► R m is linear , then it is differentiable everywhere, and its 
derivative at all points a is f: 


[Df(a)j? = f(v). 


1 . 8.1 
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Note that the terms on the 
right of Equation 1.8.4 belong to 
the indicated spaces, and therefore 
the whole expression makes sense; 
it is the sum of two vectors in lR m , 
each of which is the product of a 
vector in R m and a number. Note 
that (Dg(a)]v is the product of a 
line matrix and a vector, hence it 
is a number. 


The expression f , g : (J — » 
in (4) and (6) is shorthand for 
f : U -* R" 1 and g : V — R m . 
We discussed in Remark 1.5.1 the 
importance of limiting the domain 
to an open subset. 


(3) If fi,. . . yfm ‘U -* R are m scalar-valued functions differentiable at a, 
then the vector-valued mapping 


f = 



: U is differentiable at a, with derivative 


[D/i(a)]v 


1.8.2 


[Df (a)]v? = 

LP/m(a)]*J 

(4) If f , g : U -* R m are differentiable at a, then soisf + g, and 

[D(f + g)(a)] = [Df(a)] + [Dg(a)J. 


1.8.3 


(5) If f :U — * R and g : U — ► R m are differentiable at a, then so is / g, and 
the derivative is given by 


[D/g(a)]tf = /(a) (Dg(a)jv + /'(a)v g(a) . 


1.8.4 


(5) Example of /g: if 

f(x)=x 2 and = , 

the»/g(x)=(^^). 

(6) Example of f ■ g: if 

f(*)=(£), g( *)= (’£*), 

then their dot product is 

(f • g)(x) — x sin x + x 2 cos x. 


(6) If f , g : U -* are both differentiable at a, then so is the dot product 
f • g : U — » R, and fas in one dimension^ 


[D(f ■ g)(a)]v = ( Df(a)]? .g(a) + f(a) • (Dg(a))v . 

I m * m *» 


1.8.5 


As shown in the proof below, the rules are either immediate, or they are 
straightforward applications of the corresponding one-dimensional statements. 
However, we hesitate to call them (or any other proof) easy; when we are 
struggling to learn a new piece on the piano, we do not enjoy seeing that it has 
been labeled an “easy piece for beginners.” 

Proof of 1.8,1. (1) If f is a constant function, then f(a + h) = f (a), so the 
derivative [Df(a)J is the zero linear transformation: 


lim 4r(f(a + h) - f(a) - Oh ) = lim -L-0 = 0. 
fi-o ]h| V J h-o |h| 

(IDf(a)Jh 


(2) Suppose f(a) is linear. Then [Df(a)j = f: 

lim-i-(f(a+h)-f(a)-f(h))=0, 
h— o |h| 

since f(a + h) = f(a) + f(h). 


1 . 8.6 


1.8.7 
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(3) Just write everything out: 

fi (a + h) 


lim 4 
h *o h 


/ m (a + h) 



r P/i(a)]£ 


n 


[[D/m(a)]h 


1 


lim h- 0 £ (/i (a + h) - /i (a) - [D/i (a)]h) 
h 

lim g _ 0 4 (f m { a + h) - / m (a) - [D/ m (a)]h) 

n 


1.8.8 


= 0 . 


According to a contemporary, 
the French mathematician Laplace 
(1749-1827) used the formula U 
est aist a voir (“it’s easy to see”) 
whenever he himself couldn’t re- 
member the details of how his rea- 
soning went, hut was sure his con- 
clusions were correct. “I never 
come across one of Laplace’s 1 Thus 
it plainly appears' without feeling 
sure that I have hours of hard work 
before me to fill up the chasm 
and find out and show how it 
plainly appears,” wrote the nine- 
teenth century mathematician N. 
Bowditch. 


(4) Functions are added point by point, so we can separate out f and g: 

(f + g)(a + fi) - (f + g)(a) - ([D/(a)) + [D 9 (a)])fi 1.8.9 

= (f(a + fi) - f(a) - [D/(a)]h) + (g(a + fi) - g(a) - [Dg(a)]fi). 

Now divide by |hj, and take the limit as )h| — » 0. The right-hand side gives 
0 + 0 = 0, so the left-hand side does too. 

(5) By part (3), we may assume that m = 1, i.e., that g = g is scalar valued. 
Then 

Jacobian matrix of fg 

(D/j(a)]fi = ](Difg)(a),. . . , (£> n / 9 )(a)]fi 1.8.10 

= [/(a)(Z>i 9 )(a) + (r»,/)(a) 9 (a). . . . , /(a)(D nS )(a) + (£> n /)(a) 9 (a)] fi 
s 1 1 ■ — - " > ■ * 

in one variable, {fg)'=fg'+f'g 

= /(a)[(Dis)(a) (D n g)( a)Jh + ((/>,/)( a), .... (£>„/) (a)) s (a)h 

S V ' s V. * 

Jacobian matrix of g Jacobian matrix of/ 


= /(a)([D 9 (a)]fi) + ((D/(a)]h) 9 (a). 


(6) Again, write everything out: 

def. of ^ n 

[D(f • g)(a))h dot = od ‘ [D(53/^)( a)]h = ]r[D(/i&)(a)] h 

t=i 

(5) n 

= £(|D/,(a)]fi) 9i (a) + / i (a)([D ft (a)]fi) L8U 

»=1 

def. of 

°' = 0d ([Df(a)jh) g(a) + f(a) • ((Dg(a)]fi). 

The second equality uses rule (4) above: f • g is the sum of the so the 
derivative of the sum is the sum of the derivatives. The third equality uses rule 
(5). A more conceptual proof of (5) and (6) is sketched in Exercise 1.8.1. Q 
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The chain rule 


Some physicists claim that the 
chain rule is the most important 
theorem in all of all mathematics. 


The chain rule is proved in Ap- 
pendix A.l. 


One rule for differentiation is so fundamental that it deserves a subsection of 
its own: the chain rule , which states that the derivative of a composition is the 
composition of the derivatives , as shown in Figure 1.8.1. 

Theorem 1.8.2 (Chain rule). Let U C R n , V c R m be open sets, Jet 
g : U — + V and f : V — ► he mappings and let a be a point of U. If g is 
differentiable at a and f is differentiable at g(a), then the composition fog 
is differentiable at a, and its derivative is given by 

[D(f o g)(a)J = [Df(g(a))J o [Dg(a)]. 1.8.12 


In practice, when we use the chain rule, most often these linear transforma- 
tions will be represented by their matrices, and we will compute the right-hand 
side of Equation 1.8.12 by multiplying the matrices together: 


P(f o g)(a)J = [Df(g(a))][Dg(a)]. 1.8.13 


[Dg(»)J[Df(b)Jw- 



(Dg(a)KDf(b)]v= 
ID(f gKa)]v 


Figure 1.8.1. The function g maps a point a € U to a point g(a) 6 V. The function 
f maps the point g(a) = b to the point f(b). The derivative of g maps the vector v 
to (Dg(a)](v) = w. The derivative of f o g maps V to [Df(b)](tf ). 


Remark. One motivation for discussing matrices, matrix multiplication, linear 
transformations and the relation of composition of linear transformations to 
matrix multiplication at the beginning of this chapter was to have these tools 

available now. In coordinates, and using matrix multiplication, the chain rule 
states that 

, m 

D}(* 0 g).(a) = Y. Dk!M»))D j9k {*). 

k= 1 


1.8.14 
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In Example 1.8.3, R 3 plays the 
role of V in Theorem 1.8.2. 

You can see why the range of 
g and the domain of / must be 
the same (i.e., V in the theorem; 
R 3 in this example): the width of 
(D/(g(£))j must equal the height 
of (Dg(t)] for the multiplication to 
be possible. 


We will need this form of the chain rule often, but as a statement, it is a disaster: 
it makes a fundamental and transparent statement into a messy formula, the 
proof of which seems to be a computational miracle. A 

Example 1.8.3 (The derivative of a composition). Suppose g : R — ► R 3 
and / : R 3 -♦ R are the functions 

/ ( =i 2 + v 2 + 2 2 ; g(0- 


I t 2 ) . 1.8.15 

\t 3 / 


The derivatives (Jacobian matrices) of these functions are computed by com- 
puting separately the partial derivatives, giving, for / , 



[2x,2y, 2z). 


1.8.16 


(The derivative of / is a one-row matrix.) The derivative of / evaluated at g (t) 
is thus [2 £, 2£ 2 , 2£ 3 ]. The derivative of g at t is 


pg(01 = 


1 

2 1 
3 1 2 


1.8.17 


Equation 1.7.45 says that the 
derivative of the “squaring func- 
tion” / is 

[Df{A)]H = AH + HA. 

In the second line of Equation 
1.8.19, g(A) = A~ l plays the role 
of A above, and —A~ l HA~ 1 plays 
the role of H . 

Notice the interesting way this 
result is related to the one- variable 
computation: if f(x) = * -2 , then 
f'(x) — —2x~ 3 . Notice also how 
much easier this computation is, 
using the chain rule, than the 
proof of Proposition 1.7.16, with- 
out the chain rule. 


So the derivative at t of the composition / o g is 

P(/og)(<)) = [D/(g(t))] ° [Dg(t)) = {2t,2t\2t 3 ] 

P/(g(t))l L 


1 

2 1 
3 1 2 


IDg(e)) 


= 2t + 4£ 3 + 6£ 5 . 


A 1.8.18 


Example 1.8.4 (Composition of linear transformations). Here is a case 
where it is easier to think of the derivative as a linear transformation than as 
a matrix, and of the chain rule as speaking of a composition of linear transfor- 
mations rather than a product of matrices. If A and H are nxn matrices, and 
f(A) = A 2 ,g(A) = A~ l , then (/ o p)(i4) = i4“ 2 . To compute the derivative of 
fog we use the chain rule in the first line, Proposition 1.7.16 in the second and 
Equation 1.7.45 in the third: 

P/og(i»)}S = P/(g(i4))](Dg(i*)]/r 

= + (-A-'HA- l )A-' 1819 

= ~{a~ 2 HA-' + A~' HA - 2 ). 


Exercise 1.8.7 asks you to compute the derivatives of the maps A •— » A~ 3 
and A *-> A~ n . 
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1.9 The Mean Value Theorem and Criteria for 
Differentiability 

I turn, with terror and horror from this lamentable scourge of con- 
tinuous functions with no derivatives. — Charles Hermite, in a letter to 
Thomas Stieltjes, 1893 

In this section we discuss two applications of the mean value theorem (The- 
orem 1.6.9). The first extends that theorem to functions of several variables, 
and the second gives a criterion for when a function is differentiable. 

The mean value theorem for functions of several variables 

The derivative measures the difference of the values of functions at different 
points. For functions of one variable, the mean value theorem says that if 
/ : [a, 6] — » IR is continuous, and / is differentiable on (a. 6), then there exists 
c € ( a.b ) such that 


/(&)-/(a) = /'(c)(6-a). 1.9.1 

The analogous statement in several variables is the following. 

Theorem 1.9.1 (Mean value theorem for functions of several vari- 
ables). Let U C R n be open , / : V — > IR be differentiable, and the segment 
[a, bj joining a to b be contained in U. Then there exists c € [a, b] such that 

/(b) - /(a) = [D/(c)](b - a). 1.9.2 


Corollary 1.9.2. If f is a function as defined in Theorem 1.9.1, then 


l/(b) - 



sup 

c€(a.b] 



|b - a 


1.9.3 


Proof of Corollary 1.9.2. This follows immediately from Theorem 1.9.1 and 
Proposition 1.4.11. □ 


Proof of Theorem 1.9.1. Note that as t varies from 0 to 1, the point 
(l-t)a+tb moves from a to b. Consider the mapping g(t) = /((l-t)a-f tb). By 
the chain rule, g is differentiable, and by the one-variable mean value theorem, 
there exists to such that 

p(l)-p(0)=p , (to)(l-0)=p , (t o ). 


1.9.4 
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Set c = (1 - £o) a + fob. By Proposition 1.7.12, we can express g'(t 0 ) in terms 
of the derivative of f : 

9 '( t0 ) = ,i„, 9(to + = /(<= + «(»> ~ »)) - _ /Q - |D/(c)|(b - a). 


s— >0 S 

So Equation 1.9.4 reads 

/(b) - /(a) = [D/(c)](b - a). □ 


1.9.5 


1.9.6 


Criterion for differentiability 



The graph of / is made up of 
straight lines through the origin, 
so if you leave the origin in any 
direction, the directional deriva- 
tive in that direction certainly ex- 
ists. Both axes are among the lines 
making up the graph, so the direc- 
tional derivatives in those direc- 
tions are 0. But clearly there is 
no tangent plane to the graph at 
the origin. 


Most often, the Jacobian matrix of a function is its derivative. But as we 
mentioned in Section 1.7, this isn’t always true. It is perfectly possible for all 
partial derivatives of / to exist, and even all directional derivatives, and yet 
for / not to be differentiable! In such a case the Jacobian matrix exists but 
does not represent the derivative. This happens even for the innocent-looking 
function 



x 2 y 

x 2 + y 2 ' 


Actually, we should write this function 




x 2 y 

x 2 + y 2 


0 



1.9.7 


shown in Figure 1.9.1. You have probably learned to be suspicious of functions 
that are defined by different formulas for different values of the variable. In this 
case, the value at ^ q ^ is really natural, in the sense that as ^ ^ ^ approaches 

(q), the function / approaches 0. This is not one of those functions whose 
value takes a sudden jump; indeed, / is continuous everywhere. Away from the 
origin, this is obvious; / is then defined by an algebraic formula, and we can 
compute both its partial derivatives at any point ^ ^ ^ ( 0 ) ' 

That / is continuous at the origin requires a little checking, as follows. If 
x 2 + y 2 = r 2 , then \x\ < r and |y| < r so \x 2 y\ < r 3 . Therefore, 


“Vanish” means to equal 0. 
“Identically” means “at every 
point.” 


/ 





1.9.8 


So / is continuous at the origin. Moreover, / vanishes identically on both axes, 
so both partial derivatives of / vanish at the origin. 
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So far, / looks perfectly civilized: it is continuous, and both partial deriva- 
tives exist everywhere. But consider the derivative in the direction of the vector 
1 


1 


, i.e., the directional derivative 


If we change our function, re- 
placing the x 2 y in the numera- 
tor of the algebraic formula by xy, 
then the resulting function, which 
we’ll call g, will not be continu- 
ous at the origin. If x = y, then 


9 


1 /2 no matter how close 



gets to the origin: we then have 



lim 
t— 0 




lim _ 
o2 t 3 


1.9.9 


This is not what we get when we compute the same directional derivative by 
multiplying the Jacobian matrix of / by the vector ^ , as in the right-hand 
side of Equation 1.7.27: 


M8W(8)] 

s ^ ' 

Jacobian matrix (J/(0)) 



1.9.10 


Thus, by Proposition 1.7.12, / is not differentiable. 

In fact, things can get worse. The function we just discussed is continuous, 
but it is possible for all directional derivatives of a function to exist, and yet 
for the function not to be continuous, or even bounded in a neighborhood of 0, 
as we saw in Example 1.5.24; Exercise 1.9.2 provides another example. 


Continuously differentiable functions 

The lesson so far is that knowing a function’s partial derivatives or directional 
derivatives does not tell you either that the function is differentiable or that it 
is continuous. Even in one variable, derivatives alone reveal much less than you 
might expect; we will see in Example 1.9.4 that a function / : R — ► R can have 
a positive derivative at x although it does not increase in a neighborhood of x! 

Of course we don’t claim that derivatives are worthless. The problem in 
these pathological cases is that the function is not continuously differentiable: its 
derivative is not continuous. As long as a function is continuously differentiable, 
things behave nicely. 


Example 1.9.3 (A function that has partial derivatives but is not 
differentiable). Let us go back to the function of Equation 1.9.7, which we 
just saw is not differentiable: 
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and reconsider its partial derivatives. We find that both partials are 0 at the 
origin, and that away from the origin— i.e., if (y) ^ (o) then 



Graph of the function f(x) — 
§ + 6x 2 sin The derivative of / 
does not have a limit at the origin, 
but the curve still has slope 1/2 


,fx\_ (x 2 + y 2 )(2xy) - x 2 y(2x) = 2xy 3 
lf \y) (x 2 + y 2 ) 2 (x 2 + y 2 ) 2 19n 

■ ■M (x 2 + y 2 )(x 2 )-x 2 y(2y) x 4 - x 2 y 2 
° 2 l \y)~ (x 2 + y 2 ) 2 (x 2 + y 2 ) 2 

These partial derivatives are not continuous at the origin, as you will see if 
you approach the origin from any direction other than one of the axes. For 

example, if you compute the first partial derivative at the point of the 

diagonal, you find the limit 

isM9-(Sf-5' 19,2 

which is not the value of 

£),/(<>) =0. A 1.9.13 


Example 1.9.4 (A differentiable yet pathological function in one vari- 
able). Consider the function 

/(x) = jr +x 2 sini, 1.9.14 

2 x 

a variant of which is shown in Figure 1.9.2. To be precise, one should add 
/( 0) = 0, since sinl/x is not defined there, but this was the only reasonable 
value, since 


lim x 2 sin - = 0. 
*— o x 


1.9.15 


Moreover, we will see that the function / is differentiable at the origin, with 
derivative 

/'( 0) = \ 1.9.16 

This is one case where you must use the definition of the derivative as a limit; 
you cannot use the rules for computing derivatives blindly. In fact, let’s try. 
We find 

fix ) = i 4- 2xsin — 4- x 2 (cos - j ( — — J = i 4- 2xsin - - cos 1.9.17 
' w 2 x \ xj \ x 2 / 2 x x 


This formula is certainly correct for x ^ 0, but /'(x) doesn’t have a limit when 
x — > 0. Indeed, 


there. 


1 _ . 1 1 

lim - 4- 2x sin - = - 
x— o 2 x 2 


1.9.18 



The moral of the story is: only 
study continuously differentiable 
functions. 


A function that is continuously 
differentiable — i.e., whose deriva- 
tive is continuous — is known as a 
C x function. 


If you come across a function 
that is not continuously differen- 
tiable (and you may find such 
functions particularly interesting) 
you should be aware that none of 
the usual tools of calculus can be 
relied upon. Each such function 
is an outlaw, oheying none of the 
standard theorems. 
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does exist, but cos l/x oscillates infinitely many times between -1 and 1. So / 
will oscillate from a value near -1/2 to a value near 3/2. This does not mean 
that f isn't differentiable at 0. We can compute the derivative at 0 using the 
definition of the derivative: 


/'( 0 ) 


= lim — 
/»— o h 


/0 + h 


+ (0 4- h ) 2 sin 



= lim - 
h— o h 



h 2 



1.9.19 


= - + lim h sin r = -, 

2 h— *o h 2 

since (by Theorem 1.5.21, part (f)) lim^o/i sin £ exists, and indeed vanishes. 

Finally, we can see that although the derivative at 0 is positive, the function 
is not increasing in any neighborhood of 0, since in any interval arbitrarily close 
to 0 the derivative takes negative values; as we saw above, it oscillates from a 
value near —1/2 to a value near 3/2. A 


This is very bad. Our whole point is that the function should behave like its 
best linear approximation, and in this case it emphatically doesn’t. We could 
easily make up examples in several variables where the same occurs: where the 
function is differentiable, so that the Jacobian matrix represents the derivative, 
but where that derivative doesn’t tell you much. 


Determining whether a function is continuously differentiable 

Fortunately, you can do a great deal of mathematics without ever dealing with 
such pathological functions. Moreover, there is a nice criterion that allows us 
to check whether a function in several variables is continuously differentiable: 

Theorem 1.9.5 (Criterion for differentiability). If U is an open subset 
of R n , and i:JJ —* K m is a mapping such that all partial derivatives off 
exist and are continuous on (7, then f is differentiable on U, and its derivative 
is given by its Jacobian matrix . 

Definition 1.9.6 (Continuously differentiable function). A function 
is continuously differentiable on U C M n if all He partial derivatives exist and 
are continuous on U. 

Most often, when checking that a function is differentiable, the criterion of 
Theorem 1.9.5 is the tool used. Note that the last part, “ . . . and its derivative is 
given by its Jacobian matrix,” is obvious; if a function is differentiable, Theorem 
1.7.10 tells us that its derivative is given by its Jacobian matrix. So the point 
is to prove that the function is differentiable. Since we are told that the partial 
derivatives of f are continuous, if we prove that f is differentiable, we will have 
proved that it is continuously differentiable. 
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In Equation 1.9.20 we use the 
interval (a, a -f h), rather than 
(a, 5), making the statement 

/'( c ) = /(« -t- h) — f(a ) 
h 
or 

hf'{c) = f(a + h) - f(a). 


(What are the dimensions of the derivative of the function f described in 
Theorem 1.9.5? Check your answer below. 23 ) 

Proof. This is an application of Theorem 1.6.9, the mean value theorem. What 
we need to show is that 

lim -if (a + h) - f(a) - [Jf(a)]h = 0. 1.9.20 

i»— o |h| 


It will become clearer in Chap- 
ter 2 why we emphasize the di- 
mensions of the derivative [Df(a)J. 
The object of differential calculus 
is to study nonlinear mappings by 
studying their linear approxima- 
tions, using the derivative. We 
will want to have at our disposal 
the techniques of linear algebra. 
Many will involve knowing the di- 
mensions of a matrix. 


First, note (Theorem 1.8.1, part (3)), that it is enough to prove it when 
m = 1 (i.e., /:£/-» ]R). 

Next write 


/ CL\ + tl\ 

/(a+h)-/(a) = / a2 + hi j-f 

\On + K 

in expanded form, subtracting and adding inner terms: 



1.9.21 


/(a + h) - /(a) = 


/ a \ 4 - h\ \ 

a 2 4 " ^2 

• 

• 

-f 

( a l \ 

CL2 4* /&2 | 

\ On 4" hn ) 


^ 4“ h n ) 


subtracted 


/ °1 \ 
| 02 4- ^2 

-f 

( Ol \ 
02 

\ 0>n 4" hn ) 


^ ®n 4“ h n ) 


added 

4- . . . 


t 'I 


f a l\ 

+/ 

02 


02 

\ O n -f- h n ) 


\a n / 


1.9.22 


“The function f goes from a subset of K“ to R m , so its derivative takes a vector 
in R and gives a vector in R m . Therefore it is an m x n matrix, m tall and n wide. 
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The inequality in the second 
line of Equation 1.9.25 comes from 
the fact that |/i,|/|h| < 1. 


By the mean value theorem, the ith term above is 



/ a, \ 

/ d] \ 


/ di \ 


a 2 


a 2 


d 2 

/ 

flj ■+■ hi 


CLi 

= hi D t f 

bi 


fli+i 4- hi-fi 


Oi+ 1 + hi+\ 


a*+i 4- hi+ 1 


\ Q> n 4 1 hn ) 


K On 4" h n y 


V d n 4- h n ) 


ith term 


1.9.23 


for some 6* € (a t , a t 4- fr*]: there is some point 6 t in the interval [a*, a* -fr /t*] such 
that the partial derivative A/ at bi gives the average change of the function / 
over that interval, when all variables except the tth are kept constant. 

Since / has n variables, we need to find such a point for every i from 1 to n. 
We will call these points c »: 


c, = 


/ <*i \ 

a 2 

b t 

a»+i *f /i t +i 


\ On + h n / 

Thus we find that 


n 

this gives /(a + h) - /(a) = ^ hi A/(c*). 

i=i 


/(a + h ) - /(a) - £ A/(a)/ii = J>(A/(c<) - A/C a)). 1.9.24 

-Er=i ,==1 

So far we haven’t used the hypothesis that the partial derivatives A/ are 
continuous. Now we do. Since A/ is continuous, and since c, tends to a as 
h — * 0, we see that the theorem is true: 


lim 

h — *0 


|/(a + h) - /(a) - 


Er=, £>,/(•)'>, 

iw is i 



]im £ %OJ( Ci ) - D,f( a)| 

IN 


< 


n 


lim V |D,/(c t ) - D,f( a)| 

h-ofrf 


i=l 


= 0. 


□ 1.9.25 
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Example 1.9.7. Here we work out the above computation when / is a function 
on R 2 : 

fli 4- h\ 

0,2 + ^2 






— hi£)i/(ci) + /i2^2/( c 2)- A 1.9.26 

1.10 Exercises for Chapter One 


Exercises for Section 1.1: 
Vectors 


1.1.1 Compute the following vectors by coordinates and sketch what you did: 


(a) 


+ 


(b) 2 


(c) 


1.1.2 Compute the following vectors: 



*3’ 


r 


(a) 

7T 

1 

+ 

-i 

.A 

(b) 


4 

c 


+ e 2 


1.1.3 Name the two trivial subspaces of R n . 


2 

1 


(d) 


+ ®i 


(c) 


rn 

4 

c 

L2J 


- e 4 


1.1.4 Which of the following lines are subspaces of R 2 (or R n )? For any that 
is not, why not? 


(a) y = —2x - 5 (b) y = 2x + 1 

1.1.5 Sketch the following vector fields: 



(a) 

(d) 

(g) 

0 ) 




128 Chapter 1. Vectors, Matrices, and Derivatives 


1.1.6 Suppose that in a circular pipe of radius a, water is flowing in the 
direction of the pipe, with speed a 2 - r 2 , where r is the distance to the axis of 
the pipe. 

(a) Write the vector field describing the flow if the pipe is in the direction of 
the 2 -axis. 

(b) Write the vector field describing the flow if the axis of the pipe is the 
unit circle in the (x, y)-plane. 


Exercises for Section 1.2: 
Matrices 


In Exercise 1.2.2, remember to 
use the format: 


1 2 3 
4 5 6 


7 8 
9 0 
1 2 


A = 


B = 

Matrices for Exercise 1.2.4 


1 2 0 
3 1 -1 

'2 5 1 
1 4 2 
1 3 3 


1.2.1 (a) What are the dimensions of the following matrices? 

« 


a b c 
d e / 



r - 


7 r 1 

(b) 

4 1 

0 2 
• ■ 

(c) 

0 1 

1 0 

to « 



'l 

0 

0 

f 


‘1 

0 

o' 

(d) 

0 

1 

0 

1 

(e) 

0 

1 

0 


1 

0 

1 

0 


0 

0 

1 


(b) Which of the above matrices can be multiplied together? 

1.2.2 Perform the following matrix multiplications when it is possible. 


(a) 


(c) 


(«) 


12 3 

4 5 6 

1 -1 
-1 0 

-1 1 


7 8 
9 0 
1 2 


(b) 


1 

0 


2 

3 


1 4 
-1 3 
—2 2 


1 

2 

1 


0 

-1 

2 


1 -1 

1 2 

0 -2 


; (d) 


7 1 

-1 0 
2 3 


5 
-4 


1 

0 


2 

3 


1 

-1 


4 

3 


0 1 
-1 3 


; (0 


0 2 1 
1 3 2 


0 

3 5 


i] 


1.2.3 Compute the following without doing any arithmetic, 
(a) 


7 

6 

3 


2 v/3 4 

8 a 2 2 
y/S a 7 


1 

■0" 

r 


1 

0 

(b) 

-1 

.0. 

L 


6a 2 3a 2 

4 2 y/a 2 

5 12 3 


©2 (C) 


2 

3 


1 

2 


8 6 
v/3 4 


©3 


1.2.4 Given the matrices >1 and 5 in the margin at left, 

(a) Compute the third column of AB without computing the entire matrix 
AB. 

(b) Compute the second row of AB, again without computing the entire 
matrix AB. 


1.2.5 For what values of a do the matrices 


A = 


1 1 
1 0 


and B = 


1 0 
a 1 


satisfy AB = BA? 
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1 1 01 

C = 10 1 

1 1 0 


Matrices for Exercise 1.2.9 


Recall that Mat (n, m) denotes 
the set ofnxm matrices. 


1.2.6 For what values of a and b do the matrices 


A = 


1 

a 


a 

0 


and B = 


1 

b 


0 

1 


satisfy AB — BAl 


1.2.7 FYom the matrices below, find those that are transposes of each other. 



'12 3 1 


lxl 


'lx 2 2 ■ 

(a) 

x o \/5 

(b) 

2 0 \/5 

(c) 

x 0 \/3 


1 x 2 2 


3 x 2 2 


1 2 3 


3 %/3 2 " 


1 x r 


'1 2 3 ' 

(d) 

2 0 x 2 

(e) 

x 2 0 2 

(0 

x 0 x 2 


lxl 


2 n/3 3 


i v/3 2 


1.2.8 Given the two matrices A = 
(a) What are their transposes? 


0 

0 


and B = 


1 

2 


0 1 
1 0 


(b) Without computing AB what is (j 4£) t ? 

(c) Confirm your result by computing AB. 

(d) What happens if you do part (b) using the incorrect formula (AB) T = 
A t B t ? 


1.2.9 Given the matrices A y B t and C at left, which of the following expres- 
sions make no sense? 

(a) AB (b) BA (c) .4 + £ (d) AC 

(e) BC (f )CB (g)^ (h) B r A (i ) B T C 

1.2.10 Show that if A and B are upper-triangular n x n matrices, then so is 
AB. 


1.2.11 (a) What is the inverse of the matrix A = 


a b 
0 a 


for a ^ 0? 


(b) If we identify Mat (2, 2) with R 4 in the standard way, what is the angle 
between A and A~ l ? Under what condition are A and A~ l orthogonal? 

1.2.12 Confirm by matrix multiplication that the inverse of 


[ a i\ 

is A~ l — — - — 

d -b 

[c d 

ad-bc 

— c a 


1.2.13 Prove that a matrix 


a b 
c d 


is not invertible if ad - be = 0. 


1.2.14 Prove Theorem 1.2.17: that the transpose of a product is the product 
of the transposes in reverse order: 

(AB) J = B t A t . 
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“Stars” indicate difficult 
cises. 



Graphs for Exercise 1.2.18 


1.2.15 Recall from Proposition 1.2.23, and the discussion preceding it, what 
the adjacency graph of a matrix is. 

(a) Compute the adjacency matrix At for a triangle and A$ for a square. 

(b) For each of these, compute the powers up to 5, and explain the meaning 
of the diagonal entries. 

(c) For the triangle, you should observe that the diagonal terms differ by 1 
from the off-diagonal terms. Can you prove that this will be true for all powers 
of j4t? 

(d) For the square, you should observe that half the terms are 0 for even 
powers, and the other half are 0 for odd powers. Can you prove that this will 
be true for all powers of j4g? 

*(e) Show that half the terms of the powers of an adjacency matrix will be 0 
for even powers, and the other half are 0 for odd powers, if and only if you can 
color the vertices in two colors, so that every edge joins a vertex of one color to 
a vertex of the other. 

1.2.16 (a) For the adjacency matrix A corresponding to the cube (shown in 
Figure 1.2.6), compute A 2 , A 3 and A 4 . Check directly that (i4 2 )(j4 2 ) = (A Z )A. 

(b) The diagonal entries of A 4 should all be 21; count the number of walks 
of length 4 from a vertex to itself directly. 

(c) For this same matrix A, some entries of A n are always 0 when n is even, 
and others (the diagonal entries for instance) are always 0 when n is odd. Can 
you explain why? Think of coloring the vertices of the cube in two colors, so 
that each edge connects vertices of opposite colors. 

(d) Is this phenomenon true for At, As ? Explain why, or why not. 

1*2.17 Suppose we redefined a walk on the cube to allow stops: in one time 
unit you may either go to an adjacent vertex, or stay where you are. 

(a) Find a matrix B such that counts the walks from V* to Vj of length 
n. 

(b) Compute B 2 , B 3 and explain the diagonal entries of B 3 . 

1.2.18 Suppose all the edges of a graph are oriented by an arrow on them. 
We allow multiple edges joining vertices, so that there might be many (a su- 
perhighway) joining two vertices, or two going in opposite directions (a 2-way 
street). Define the oriented adjacency matrix to be the square matrix with both 
rows and columns labeled by the vertices, where the {i,j) entry is m if there 
are m oriented edges leading from vertex i to vertex j. 

What are the oriented adjacency matrices of the graphs at left? 
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“Stars” indicate difficult exer- 
cises. 


Exercises for Section 1.3: 

A Matrix as a Transformation 


1 2 19 An oriented walk of length n on an oriented graph consists of a se- 
quence of vertices V 0 ,V u ...,V n such that V.,V 1+1 are, respectively, the begin, 
ning and the end of an oriented edge. 

(a) Show that if A is the oriented adjacency matrix of an oriented graph, 
then the (i, j) entry of A n is the number of oriented walks of length n going 
from vertex i to vertex j. 

(b) What does it mean for the oriented adjacency matrix of an oriented graph 
to be upper triangular? lower triangular? diagonal? 


1.2.20 (a) Show that 


(b) Show that the matrix 





0 

0’ 

[a 1 

O' 

is a left inverse of 

1 

0 

[ b 0 

1 

0 

1 


0 0 
1 0 
0 1 


has no right inverse. 


(c) Find a matrix that has infinitely many right inverses. (TVy transposing.) 


1.2.21 Show that 


'1 a b 


1 x y' 

0 1 c 

has an inverse of the form 

0 1 z 

0 0 1 
■ — 


0 0 1 


and find it. 

*1.2.22 What 2x2 matrices A satisfy 

A 2 = 0, A 2 = /, A 2 = -II 

1.3.1 Are the following true functions? That is, are they both everywhere 
defined and well defined? 

(a) “The aunt of,” from people to people. 

(b) f(x) — from real numbers to real numbers. 

(c) M The capital of,” from countries to cities (careful — at least two countries, 
the Netherlands and Bolivia, have two capitals.) 

1.3.2 (a) Give one example of a linear transformation T : R 4 — ► R 2 . 

(b) What is the matrix of the linear transformation Si : R 3 — * R 3 corre- 
sponding to reflection in the plane of equation x * yl What is the matrix 
corresponding to reflection 52 : R 3 — ► R 3 in the plane y = z? What is the 
matrix of S\ o 52? 


1.3.3 Of the functions in Exercise 1.3.1, which are onto? One to one? 


1.3.4 (a) Make up a non-mathematical transformation that is bijective (both 

onto and one to one), (b) Make up a mathematical transformation that is 
bijective. 
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1 . 3.5 (a) Make up a non-mathematical transformation that is onto but not 
one to one. (b) Make up a mathematical transformation that is onto but not 
one to one. 

1 . 3.6 (a) Make up a non-mathematical transformation that is one to one but 
not onto, (b) Make up a mathematical transformation that is one to one but 
not onto. 

1 . 3.7 The transformation f(x) = x 2 from real numbers to real positive num- 
bers is onto but not one to one. 

(a) Can you make it 1-1 by changing its domain? By changing its range? 

(b) Can you make it not onto by changing its domain? By changing its 
range? 

1 . 3.8 Which of the following are characterized by linearity? Justify your 
answer. 

(a) The increase in height of a child from birth to age 18. 

(b) “You get what you pay for.” 

(c) The value of a bank account at 5 percent interest, compounded daily, as 
a function of time. 

(d) “Two can live as cheaply as one.” 

(e) “Cheaper by the dozen” 

1 . 3.9 For each of the following linear transformations, what must be the 
dimensions of the corresponding matrix? 

(a) T : R 2 — > M 3 (b) T : K 3 -> M 3 

(c) T : R 4 — » R 2 (d) T : 1R 4 - R 

1 . 3.10 For the matrices at left, what is the domain and range of the corre- 
sponding transformation? 

1 . 3.11 For a class of 150 students, grades on a mid-term exam, 10 homework 
assignments, and the final were entered in matrix form, each row corresponding 
to a student, the first column corresponding to the grade on the mid-term, the 
next 10 columns corresponding to grades on the homeworks and the last column 
corresponding to the grade on the final. The final counts for 50 percent, the 
mid-term counts for 25 percent, and each homework for 1.5 percent of the final 
grade. What is the transformation T : 1R 12 — » R that assigns to each student 
his or her final grade? 

1 . 3.12 Perform the composition / ogoh for the following functions and values 
of x. 

(a) f(x) = x 2 - 1, g(x) = 3x, h(x ) = -i+ 2, for x = 2. 


In Exercise 1.3.9, remember 
that the height of a matrix is given 
first: a 3 x 2 matrix is 3 tall and 2 
wide. 


'l 3 0 r 

(a) A = 0 3 1 5 

.12 0 1 

ai bi 

0.2 b 2 

(b) B — 03 53 

04 64 

.05 65 

(c) C = \ n 10 ^1 

w lo -1 2 1 J 

£>=[l 0 -2 5]. 
Matrices for Exercise 1.3.10 
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(b) f(x) — x 2 , g(x) = x - 3, h(x) = x — 3, for x = 1. 

1.3.13 Find the matrix for the transformation from K 3 — ♦ IR 3 that rotates by 
30° around the t/-axis. 

1.3.14 Show that the mapping from R n to Ufc m described by the product Av 
is indeed linear. 


1.3.15 Use composition of transformations to derive from the transformation 
in Example 1.3.17 the fundamental theorems of trigonometry: 

cos(0i + 62 ) = cos 0i cos 02 ~ sin 0i sin 02 


In Exercise 1.3.18 note that the 
symbol *-* (to be read, “maps to” ) 
is different from the symbol — ► (to 
be read “to”). While — * describes 
the relationship between the do- 
main and range of a mapping, as 
in T : R a — ♦ ]R, the symbol ♦ de- 
scribes what a mapping does to a 
particular input. One could write 


f(x) = x 2 as / : 1 M I 5 


<•> [S] 

(b) 

v/2l 

1 ' 

(C) -1 

1. 

(d) 

’ r 

-2 

2 


Vectors for Exercise 1.4.2 


sin(0i + 02) = sin 0i cos 02 + cos0i sin 02- 

1.3.16 Confirm (Example 1.3.16) by matrix multiplication that reflecting a 
point across the line, and then back again, lands you back at the original point. 

1.3.17 If A and B are n x n matrices, their Jordan product is 

AB + BA 
2 

Show that this product is commutative but not associative. 

1.3.18 Consider M 2 as identified to C by identifying ^ ^ to z — a + ib. 

Show that the following mappings C — ► C are linear transformations, and 
give their matrices: 

(a) & : z h-> &(z) (the real part of z); 

(b) § : z S(z) (the imaginary part of z); 

(c) c : z * * z (the complex conjugate of z, i.e., z = a- if z = a + ib); 

(d) m w : z >-> wz, where w = u + iv is a fixed complex number. 

1.3.19 Show that the set of complex numbers {z|3?(wz) = 0} with fixed 
w € C is a subspace of = C. Describe this subspace. 


Exercises for Section 1.4: 1.4.1 If v and w are vectors, and A is a matrix, which of the following are 

Geometry in R n numbers? Which are vectors? 

vxw; vw; |v|; \A\; det A\ Av. 

1.4.2 What are the lengths of the vectors in the margin? 

1.4.3 (a) What is the angle between the vectors (a) and (b) in Exercise 1.4.2? 
(b) What is the angle between the vectors (c) and (d) in Exercise 1.4.2? 

1.4.4 Calculate the angles between the following pairs of vectors: 
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b| a, 


Figure 1.4.6. 

(When figures and equations 
are numbered in the exercises, 
they are given the number of the 
exercise to which they pertain.) 



• • 


1 

0 2 

(a) 

3 1 

0 2 

(b) 

2 

4 1 




.0 

1 3. 


'-2 5 

3' 


(c) 

-1 3 

4 



—2 3 

7 



’1 2 

-6' 


(d) 

0 1 

-3 



1 0 

-2 


Matrices for Exercise 1.4.9 


r 1 2 

3 


(a) 

-1 1 

1 



l 2 2 

2 



a b 

c 


(b) 

0 d 

€ 



0 0 

/. 



a b 

O' 


(c) 

c. d 

0 



. e f 

9 . 



Matrices for Exercise 1.4.11 




V 


0 

> 

0 


m J 


lim„ 

— *OG 


1 

1 

1 


(b) 



(angle between 


1 

0 

1 

0 


n 
1 
1 
LI 


*r 


T 

0 


1 

0 


1 

0 


1 

• 

• * « 


• 


as vectors in R n ). 


1.4.5 Let P be the parallelepiped 0 < x < a, 0 < y < b, 0 < z < c. 

(a) What angle does a diagonal make with the sides? What relation is there 
between the length of a side and the corresponding angle? 


(b) What are the angles between the diagonal and the faces of the paral- 
lelepiped? What relation is there between the area of a face and the corre- 
sponding angle? 


1.4.6 (a) Prove Proposition 1.4.14 in the case where the coordinates a and b 

are positive, by subtracting pieces 1-6 from (a! + 6j)(a2 4- 62), as suggested by 
Figure 1.4.6. 

(b) Repeat for the case where 61 is negative. 


1.4.7 (a) Find the equation of the line in the plane through the origin and 

T 2l 

perpendicular to the vector ^ . 

(b) Find the equation of the line in the plane through the point ^3 j and 

2 

perpendicular to the vector ^ . 

1.4.8 (a) What is the length of v n = ej -f f- e n € R n ? 

(b) What is the angle a n between v n and ej? What is lim tt _*ooQ n ? 


1.4.9 Compute the determinants of the matrices at left. 


1.4.10 


1.4.11 


Compute the determinants of the matrices 



(b) 


1 1 
1 1 



a b 
0 d 


Compute the determinants of the matrices in the margin at left. 


1.4.12 Confirm the following formula for the inverse of a 3 x 3 matrix: 


a\ 61 ci 

-1 

1 

&2C3 — 63C2 63C1 — 61 C3 61 C2 ~ &2C1 

0>2 &2 C2 


03 C 2 - 0203 ajC3 - a3Ci 0.2C\ - CL\C2 

03 &3 C3 

uct /l 

a 2^3 ““ 03^2 ^ 3^1 ~ <Z\bz Q’\b2 — CL2^1 
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1.4.13 (a) What is the area of the parallelogram with vertices at 

(b) What is the area of the parallelogram with vertices at 

(*)•(!)•(-?)•(«' 

1.4.14 Compute the following cross products: 



r ^ *1 

2x 


r*i 


1 


m «■ 

2 


21 


3 

w 

-v 

3z . 

X 

1 

ft ft 
<M ° 

i 

(b) 

2 

5 

X 

0 

3 

(c) 

1 

i-H <0 

1 1 
1 

X 

0 

2 


1.4.15 Show that the cross product of two vectors pointing in the same di- 
rection is zero. 



T 


'2' 


r n 

1.4.16 Given the vectors d — 

2 

1 

. - 

, v = 

0 

1 

, w = 

i 

i 

*— o 

1 


(a) Compute u x (v x w) and (d x v) x w. 


(b) Confirm that v • ( v x w) = 0. What is the geometrical relationship of v 
and v x w? 


1.4.17 Given two vectors, and w, show that (tf x w) = -(w x ?). 


1.4.18 Let A be a 3 x 3 matrix with columns a, b, c, and let Qa be the 3x3 
matrix with rows 


In part (c) of Exercise 1.4.18 
think of the geometric definition of 
the cross product, and the defini- 
tion of the determinant of a 3 x 3 
matrix in terms of cross products. 


(b x c) T , (c x a) T , (a x b) T . 


(a) Compute Qa when 


A = 


1 

0 

1 




(b) What is the product Qa A in the case of (a) above? 

(c) What is Qa A for any 3x3 matrix A? 

(d) Can you relate this problem to Exercise 1.4.12? 


1.4.19 (a) What is the length of 

n 

= ©i + 2e2 + • • • + ne n = ^ ^ ? 

(b) What is the angle a n ,k between w n and e* ? 

*(c) What are the limits 


lim ot n ,k 

n— » oo 


lim a n n 

n—*oo 


lim a 

n— ► oo 


n,(n/2) 
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where [n/2] stands for the largest integer not greater than nf 2? 
1.4.20 For the two matrices and the vector 


1 o' 

, B = 

'2 O' 

, c = 

T 

1 2 


0 1 




(a) compute \A\, }£|. |c|; 

(b) confirm that: \AB\ < \A\\B\ y \Ac\ < \A\\c\, and \Bc\ < |B||c|. 

1.4.21 Use direct computation to prove Schwarz’s inequality (Theorem 1.4.6) 
in R 2 for the standard inner product (dot product); i.e., show that for any 
numbers xi, X 2 , y i, j/ 2 , we have 

\xiVi + x 2 y2\ < \Jx \ + x\ \Jy 2 + y 2 . 


Exercises for Section 1.5: 
Convergence and Limits 


B = 


1 e 
0 1 
0 0 



Matrix for Exercise 1.5.1 


1 

+e 1 


Matrix for Exercise 1.5.2 


1.5.1 Find the inverse of the matrix B at left, by finding the matrix A such 

that B — I - A and computing the value of the series S = I + A + A 2 + + 

This is easier than you might fear! 


1.5.2 Following the procedure in Exercise 1.5.1, compute the inverse of the 
matrix B at left, where |c| < 1, using a geometric series. 


1.5.3 Suppose YlZi x * IS a convergent series in IR n . Show that the triangle 
inequality applies: 


X 

£ 

i = 1 


oo 


^£ 

»=i 


1.5.4 


Let A be a square n x n matrix, and define 



(a) Show that the series converges for all A, and find a bound for 
terms of |j 4| and n. 

(b) Compute explicitly e A for the following values of A: 

Ob' (2) fo 0 ’ (3) [-a 0 


111 


(1) 


For the third above, you might look up the power series for sinx and cosx. 

(c) Prove the following, or find counterexamples: 

(1) Do you think that e A + B = a A e B for ah A and B? What if AB = BA? 

(2) Do you think that e 2A = ( a A ) 2 for all A? 


1.5.5 For each of the following subsets X of R and R 2 , state whether it is 
open or closed (or both or neither), and prove it. 
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(a) {x € Ik | 0 < x < 1 } 

(t>) {(y) 6 R 2 | 1 <x 2 + y 2 < 2 

(C) {(®)eP. 2 |x^o} 

(d ) {( y) eR2 l y = 0 } 

*(e) {QcS} (the rational numbers) 


1.5.6 (a) Show that the expression 


\f x \ 

mi 2 

IU 2 J- 

UJI 


is a polynomial p(x) of degree 4. and compute it. 

(b) Use a computer to plot it; observe that it has two minima and a maxi- 

mum. Evaluate approximately the absolute minimum: you should find some- 
thing like .0663333 

(c) What does this say about the radius of the largest disk centered at ^ ^ ) 

which does not intersect the parabola of equation y — z 2 . Is the number 1/12 
found in Example 1.5.6 sharp? 

(d) Can you explain the meaning of the other local maxima and minima? 


.5.7 Find a formula for the radius r of the largest disk centered at ^ ^ ) that 


doesn’t intersect the parabola of equation y = x 2 , using the following steps: 

(a) Find the distance squared asa 4th degree polynomial in 


x. 


(b) Find the zeroes of the derivative by the method of Exercise 0.6.6. 

(c) Find r. 


1.5.8 For each of the following formulas, find its natural domain, and show 
whether it is open, closed or neither. 

(a) sin ~ (b) log \fx 2 — y (c) log log x 

(d) arcsin (e) \/e C08I v (f) ^ 

1.5.9 What is the natural domain of the function 

lim(l + z) 1/x of Example 1.5.18? 

x — *0 

1.5.10 (a) Show that if U C R n is open, and V is an open subset of U , then 
V is an open subset of M n . 

(b) Show that if A is a closed subset of R n and B is closed in A, then B is 
closed in R n . 


1.5.11 Show that if X is a subset of R n , then X is closed. 
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1.5.12 Suppose that d > : (0, oo) — (0, oo) is a function such that lim e ^ 0 0(c) = 
0. 

(a) Let ai,a 2 , . . . be a sequence in K n . Show that this sequence converges 
to a if and only if for any e > 0, there exists N such that for n > N, we have 
K - a| < 0(e). 

(b) Find an analogous statement for limits of functions. 

1.5.13 Prove the converse of Proposition 1.5.14: i.e., prove that if every con- 
vergent sequence in a set C C R n converges to a point in C, then C is closed. 


1.5.14 State whether the following limits exist, and prove it. 


(a) , lim 


(C) 1*1” 


(b) , lira 


(;)-(:) 


\/Hy 

x 2 + y 2 


(d) lim x 2 + y 3 - 3 

(:)-(!) 


(e) . 


(:)-(:) 


x + y 
o\ x2 ~V 2 


(x 2 + y 2 ) 2 
X + y 


(o , i™ 


(:)-(:) 


* (g) lim (x 2 + y 2 )(log \xy\), defined when xy ^ 0. 

(:)-(:) 

(h) lim (x 2 + y 2 ) log(x 2 + y 2 ) 

(;)-(:) 

1.5.15 (a) Let D * C M 2 be the region 0 < x 2 + y 2 < 1, and let / : D* 

be a function. What does the following assertion mean? 

lim / ( ) = a 

(:)-(:) 


(b) For the two functions below, either show that the limit exists and find 
it, or show that the limit does not exist: 

f ( X v) = 700 p(v) = (W-Hvl)io g (^ + y 4 ) 


( y ) = (|x| + |y|) log(x 2 + y 4 ) 


1.5.16 Prove Theorem 1.5.13. 

1.5.17 Prove Proposition 1.5.16: If a sequence a* converges to a, then any 
subsequence converges to the same limit. 
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1.5.18 (a) Show that the function f(x) = |x|e -|x| has an absolute maximum 
at some x > 0. 

(b) What is the maximum of the function? 

(c) Show that the image of / is [0, 1/e]. 

1.5.19 Prove Theorem 1.5.27. 


Am — 


cos m6 sin mti 
— sin mO cos m$ 


Sequence for Exercise 1.5.20 


? What is the limit? 


1.5.20 For what numbers 9 does the sequence of matrices A m (shown at left) 
converge? When does it have a convergent subsequence? 

1.5.21 (a) Let Mat (n, m) denote the space ofnxm matrices, which we will 

identify with R nm . For what numbers a € M does the sequence of matrices 

— 

A n 6 Mat (2, 2) converge as n — ♦ oo, when A = ® ® 

(b) What about 3x3 matrices filled with a’s, ornxn matrices? 

1.5.22 Let V C Mat (2, 2) be the set of matrices A such that /— A is invertible. 

(a) Show that U is open, and find a sequence in U that converges to I. 

(b) Consider the mapping / : C/ — ► Mat (2, 2) given by 

1(A) = (A 2 - I)(A - 

Does lim f(A) exist? If so, what is the limit? 

1 0 


*(c) Let B = ^ c Mat (2, 2) be the set of matrices A 

such that A - B is invertible. Again, show that V is open, and that B can be 
approximated by elements of V. 

*(d) Consider the mapping g : V — > Mat (2, 2) given by 

g(A) = (A 2 - B 2 )(A - B)~ l . 

Does lim,*—# 0 (A) exist? If so, what is the limit? 


1.5.23 (a) Show that the matrix A = 

ping IR 2 R 2 . 

*(b) Find an explicit 6 in terms of e. 
(c) Now show that the mapping 


2 

0 


2 

1 


represents a continuous map- 


xl [a b 


X 

y [c d 


y. 


is continuous for any a, 6, d, c. 


f3 i4 

1.5.24 Let a n = 2Vt> ’ *‘ e "’ ^ wo en ^ I ’l es 3X6 ?r and e, to n places. 

■ m 


How large does M have to be so that 


have to be so that 


®n 


7T 

e 


a„ - 


7T 

e 


< 10 3 ? How large does M 


< 10 " 4 ? 
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Exercises for Section 1.6: 
Pour Big Theorems 


1.5.25 


Which of the following functions are continuous at 




1 

x 2 + y 2 + 1 


(c) / ( y ) = (x 2 + y 2 ) log \x + y\ 



\/x 2 + y 2 
\x\ + \y \ l/ 3 


( b )/(^) = v /l -^ 2 -!/ 2 

(d) / ( y ) = (x 2 + y 2 ) log(i 2 + 2y 2 ) 


1.6.1 Let A c IR n be a subset that is not compact. Show that there exists a 
continuous unbounded function on A. 

Hint: If A is not bounded, then consider f(x) = |x|. If A is not closed, then 
consider /(x) = l/|x — a| for an appropriate a. 

1.6.2 In the proof of the fundamental theorem of algebra (Theorem 1.6.10), 
justify the statement (Equation 1.6.25) that 

|(6j+ii/^ +I H f- u fc )| < | bjU*\ for small u. 

1.6.3 Set z = x 4- iy, where Show that the polynomial 

p(z) = 1 + x 2 y 2 

has no roots. Why doesn’t this contradict Theorem 1.6.10? 

1.6.4 Find, with justification, a number R such that there is a root of p(z ) = 

z 5 + 4 z 3 + 3 iz — 3 in the disk \z\ < R. (You may use that a minimum of |p| is 
a root of p.) 

1.6.5 Consider the polynomial 

p(z) = z 6 + 4z 4 + 2 + 2 = z 6 + q(z ). 

(a) Find R such that |z 6 | > \q(z)\ when \z\ > R. 

(b) Find a number Ri such that you are sure that the minimum of \p{z)\ 
occurs for \z\ < R x . 

1.6.6 Prove that: 

(a) Every polynomial over the complex numbers can be factored into linear 
factors. 

(b) Every polynomial over the real numbers can be factored into linear factors 
and quadratic factors with complex roots. 

1.6.7 Find a number R for which you can prove that the polynomial 

p(z) = z 10 + 2z 9 -f 3z s H 1 - 10z + 11 

has a root for |z| < R. Explain your reasoning. 
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Exercises for Section 1.7: 
Differential Calculus 


You really must learn the nota- 
tion for partial derivatives used in 
Exercise 1.7.7, as it is used prac- 
tically everywhere, but we much 
prefer Djf, etc. 


1.7.1 Find the equation of the line tangent to the graph of f(x) at j 
for the following functions: 

(a) f(x) — sinx , a — 0 (b) f(x) = cosx , a = n/3 

(c) /(x) = cosx , a = 0 (d) f(x) = 1/x , a = 1/2. 

1.7.2 For what a is the tangent to the graph of f(x) = e~ x at (^ c ~ a ) a line 
of the form y = mi? 

1.7.3 Example 1.7.2 may lead you to expect that if / is differentiable at a, 
then f(a 4 h) - /(a) - f'(a)h has something to do with h 2 . It is not true that 
once you get rid of the linear term you always have a term that includes h 2 . 
TYy computing the derivative of 

(a) }(x) = |x| 3 / 2 at 0 (b) f(x) = x log |x| at 0 (c) }(x) = x/ log |x|, also 

at 0 


1*7*4 Find /'(x) for the following functions /. 


(a) /(x) 
(c) f(x) 

(e) f(x) 


sin 3 (x 2 4 cosx) 

(cos x) 4 sin x 

sin x 2 sin 3 x 
2 -I- sin x 


(b) f{x) 
(d) /(x) 

(o m 


cos 2 ((x 4 sinx) 2 ) 
(x + sin 4 x) 3 


= sin (—^7) 
\s1nx7 


1*7.5 Using Definition 1.7.1, show that n/J 2 and \/x* are not differentiable 
at 0, but that %/x 4 is. 


1.7.6 What are the partial derivatives D\f and D 2 f of the following func- 
tions, at the points and 


( a ) f (^ ) = V x2 + y\ 

( c ) / (y ) = cosxj t 4 y cosy; 

1.7.7 Calculate the partial derivatives 


< b ) /(£)=*** + ✓, 


Of A & 

dx ’ &y ^ or vect;or * valued functions: 


1.7.8 



Write the answers to Exercise 1.7.7 in the form of the Jacobian matrix. 
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1.7.9 (a) Given a vector- valued function 

f ( y ) = ( h ) ’ With JaCobian matrix 2X COS y% + y) y) ’ 

what is D\ of the function /i? D 2 of the function / 1 ? D 2 of / 2 ? 

(b) What are the dimensions of the Jacobian matrix of a vector- valued func- 
tion 



1.7.10 What is the derivative of the function / : R n — ► R n given by the 
formula /(x) = [x| 2 x? 

1.7.11 Show that if f(x) ~ |x|, then for any number m, 

Km (/(0 + h) - f(0) - mh) = 0, 

but that 

Umi(/(0 + A)-/(0)-mA) =0 

never exists: there is no number m such that mh is a “good approximation” to 
f(h) — /( 0) in the sense of Definition 1.7.7. 

1.7.12 (a) Show that the mapping 

Mat (n, n) — ► Mat (n,n), A\-*A 3 

is differentiable, and compute its derivative. 

(b) Compute the derivative of the mapping 

Mat (n, n) — ♦ Mat (n, n), A A k for any integer A: > 1. 

1.7.13 (a) Define what it means for a mapping F : Mat (n,m) -♦ Mat (k y l) 
to be differentiable at a point A € Mat (n,m). 

(b) Consider the function F : Mat (n, m) — ► Mat (n, n) given by 

F(A) = AA J . 

Show that F is differentiable, and compute the derivative (DF(j4)]. 

^^•1^ Compute the derivative of the mapping A AA r . 

1.7.15 Let A = \ a *1 and A 2 = “ l 6 * 

[c d J [ Cl d, ' 
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( a\ \ 

b 


b. 

c 


Ci 

\dj 


\<V 


The function of Exercise 1.7.15 


(a) Write the formula for the function S : l 4 -► K 4 defined at left. 

(b) Find the Jacobian matrix of 5. 

(c) Check that your answer agrees with Example 1.7.15. 

(d) (For the courageous): Do the same for 3 x 3 matrices. 


1.7.16 


Which of the following functions are differentiable at 



(a)/(*)=sm(e*v) (b) / (*) 

(«) f(y) = l* + »l (<!) f(l) 



1.7.17 Find the Jacobian matrices of the following mappings: 


(.) / f J) " b} / ( J) * 

<■>'(;) -(.?,) «)'(») -o?) 



1.7.18 In Example 1.7.15, prove that the derivative AH + HA is the “same” 
as the Jacobian matrix computed with partial derivatives. 


1.7.19 (a) Let U C R n be open and f : U — ► M m be a mapping. When is f 

differentiable at a e U? What is its derivative? 

(b) Is the mapping f : R n — ♦ R n given by 


Hint for 1.7.20 (a): Think of 




— 

^1 

, I as the element 

6 


c d\ 

c 

W 


of R 4 . Use the formula for com- 
puting the inverse of a 2 x 2 matrix 
(Equation 1.2.15). 


rc*> = 1 * 1 * 

differentiable at the origin? If so, what is its derivative? 

1.7.20 (a) Compute the derivative (Jacobian matrix) for the function f{A) = 
A~ l described in Proposition 1.7.16, when A is a 2 x 2 matrix. 

(b) Show that your result agrees with the result of Proposition 1.7.16. 

1.7.21 Considering the determinant as a function only of 2 x 2 matrices, i.e., 
det : Mat (2, 2) *-> R, show that 

[Ddet(/)j/f = hi ( i + fa#* 

where I of course is the identity and H is the increment matrix 

hi,i hi, 2 
^ 2,1 ^ 2,2 


i/ = 



144 Chapter 1. Vectors, Matrices, and Derivatives 


lim 
h— o 


Exercises for Section 1.8: 1.8.1 (a) Prove Leibnitz’s rule (part (5) of Theorem 1.8.1) directly when 

Rules for Computing Derivatives / : V - R and g : V - are differentiable at a, by writing 

|/(a + h)g(a + h)- /(a)g(a)- /(a) ([Dg(a)jh) - ([D/(a)]hj g(a)| 

IN 

and then developing the term under the limit: 

(/(a + h) - /(a)) g - + ^~ 8(a) + /(«) 

+ ^ /(a + h) - /(a) - [D/(a)|h \ 

(b) Prove the rule for differentiating dot products (part (6) of Theorem 1.8.1) 
by a similar decomposition. 

(c) Show by a similar argument that if f , g : U -4 R 3 are both differentiable 
at a, then so is the cross product f x g : U -+ R 3 . Find the formula for this 
derivative. 

1.8.2 (a) What is the derivative of the function 

ds 


Hint for Exercise 1.8.2: think of 
the composition of 


t 


« 


and 




r-* 

\ V ) J x s + sin 3 ’ 

both of which you should know 
how to differentiate. 


9 + sin a ’ 

(b) When is / increasing or decreasing? 
1.8.3 Consider the function 

Xi 


defined for s > 1? 


' n 


n— 1 

/ I : 1 ~ ^ x«**+i and the curve 7 : R R n given by 7 (t) = 

»=i 


n\ 

t 2 


\t n j 


What is the derivative of the function t /(7(f))? 

1.8.4 True or false? Justify your answer. If f : R 2 R 2 is a differentiable 
function with 

'©-(!) - M8)]-[! !]. 

there is no smooth mapping g : R 2 R 2 with 

*(l) = (o) “d ,0 «(5)-(l)- 

1.8.5 Let (p : R R be any differentiable function. Show that the function 

/ ( y ) = w(z 2 - y 2 ) 
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Hint for part (b): What is the 
“partial derivative of / with re- 
spect to the polar angle 0”? 


Exercises for Section 1.9: 
Criteria for Differentiability 


satisfies the equation 

= £'(«)• 

1.8.6 (a) Show that if a function f : l 2 R 2 can be written <p(x 2 + y 2 ) for 
some function <p : R — * R, then it satisfies 

xDif - yD\f — 0 . 

*(b) Show the converse: every function satisfying xD 2 f - yD\i = 0 can be 
written y>(x 2 + y 2 ) for some function p : R — * R. 

1.8.7 Referring to Example 1.8.4: (a) Compute the derivative of the map 
A A~ 3 \ 

(b) Compute the derivative of the map A •-» A~ n . 

1.8.8 ^ / ( y ) = ( x ~ - ~y) I° r 801116 differentiable function <p : R — * R, show 

that 

xD\f + yD 2 f = 0. 

1.8.9 True or false? Explain your answers, (a) If f : R 2 — ► R 2 is differentiable, 
and [Z>f (0)] is not invertible, then there is no function g : R 2 -* R 2 such that 
gof(x) = x. 

(b) Differentiable functions have continuous partial derivatives. 


1.9.1 Show that the function 


m - { i 

is differentiable at 0, with derivative /'( 0) = 1/2. 


J § + x 2 sin ± if x ^ 0 
if x = 0 


1.9.2 (a) Show that for 


/«)- 


Hr 


■ (:) - (!) 

*(:)-(!)• 

all directional derivatives exist, but that / is not differentiable at the origin. 


(b) Show that there exists a function which has directional derivatives ev- 
erywhere and isn’t continuous, or even bounded. (Hint: Consider Example 
1.5.24.) 
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Function for Exercise 1.9.3 


Hint for Exercise 1.9.4, part 
(c): You may find the following 
fact useful: |sinx| < x for all 
iel 


1.9.3 Consider the function defined on M 2 given by the formula at left. 

(a) Show that both partial derivatives exist everywhere. 

(b) Where is / differentiable? 


1.9.4 Consider the function / : M 2 — ► R given by 




( sin(x 2 y 2 ) 
x' i + y 1 


«(;)-(:) 


*G)-(!L 

(a) What does it mean to say that / is differentiable at ( q ) ? 

(b) Show that both partial derivatives £>i/(q) and D 2 /(q) 


compute them. 


(c) Is / differentiable at ^ q ^ ? 

1.9.5 Consider the function defined on R 3 defined by the formulas 



(a) Show that all partial derivatives exist everywhere. 

(b) Where is / differentiable? 



2 

Solving Equations 


Some years ago, John Hubbard was asked to testify before a subcommittee 
of the U.S. House of Representatives concerned with science and technol- 
ogy. He was preceded by a chemist from DuPont who spoke of modeling 
molecules, and by an official from the geophysics institute of California, 
who spoke of exploring for oil and attempting to predict tsunamis. 

When it was his turn, he explained that when chemists model mole- 
cules, they are solving Schrodinger f s equation, that exploring for oil re- 
quires solving the Gelf and- Levitan equation, and that predicting tsunamis 
means solving the Navier-Stokes equation. Astounded, the chairman of 
the committee interrupted him and turned to the previous speakers. “Is 
that true, what Professor Hubbard says ?” he demanded. “Is it true that 
what you do is solve equations ?” 


2.0 Introduction 


In every subject, language is in- 
timately related to understanding. 

“It is impossible to dissociate 
language from science or science 
from language, because every nat- 
ural science always involves three 
things: the sequence of phenom- 
ena on which the science is based; 
the abstract concepts which call 
these phenomena to mind; and the 
words in which the concepts are 
expressed. To call forth a con- 
cept a word is needed; to portray a 
phenomenon, a concept is needed. 
All three mirror one and the same 
reality.” — Antoine Lavoisier, 1789. 

“Professor Hubbard, you al- 
ways underestimate the difficulty 
of vocabulary.” — Helen Chigirin- 
skaya, Cornell University, 1997. 


All readers of this book will have solved systems of simultaneous linear equa- 
tions. Such problems arise throughout mathematics and its applications, so a 
thorough understanding of the problem is essential. 

What most students encounter in high school is systems of n equations in n 
unknowns, where n might be general or might be restricted to n = 2 and n = 3. 
Such a system usually has a unique solution, but sometimes something goes 
wrong: some equations are “consequences of others,” and have infinitely many 
solutions; other systems of equations are “incompatible,” and have no solutions. 
This chapter is largely concerned with making these notions systematic. 

A language has evolved to deal with these concepts, using the words “lin- 
ear transformation,” “linear combination,” “linear independence,” “kernel,” 
“span,” “basis,” and “dimension.” These words may sound unfriendly, but 
they correspond to notions which are unavoidable and actually quite trans- 
parent if thought of in terms of linear equations. They are needed to answer 
questions like: “how many equations are consequences of the others?” 

The relationship of these words to linear equations goes further. Theorems 
in linear algebra can be proved with abstract induction proofs, but students 
generally prefer the following method, which we discuss in this chapter: 

Reduce the statement to a statement about linear equations, row reduce 
the resulting matrix, and see whether the statement becomes obvious. 
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If so, the statement is true; otherwise it is likely to be false. 

Solving nonlinear equations is much harder. In the days before computers, 
finding solutions was virtually impossible; even in the good cases, where math- 
ematicians could prove that solutions existed, they were usually not concerned 
with whether their proof could be turned into a practical algorithm to find the 
solutions in question. The advent of computers has made such an abstract ap- 
proach unreasonable. Knowing that a system of equations has solutions is no 
longer enough; we want a practical algorithm that will enable us to solve them. 
The algorithm most often used is Newton’s method . In Section 2.7 we will show 
Newton’s method in action, and state Kantorovitch’s theorem, which guaran- 
tees that under appropriate circumstances Newton’s method converges to a 
solution.; in Section 2.8 we discuss the superconvergence of Newton’s method 
and state a stronger version of Kantorovitch’s theorem, using the norm of a 
matrix rather than its length. 

In Section 2.9 we will base the implicit and inverse function theorems on 
Newton’s method. This gives more precise statements than the standard ap- 
proach, and we do not believe that it is harder. 

2.1 The Main Algorithm: Row Reduction 

Suppose we want to solve the system of linear equations 

2x + y + 3z = 1 

x — y — 1 2.1.1 

2x +2 = 1. 

We could add together the first and second equations to get 3x + 3 z = 2. 
Substituting (2 - 3e)/3 for x in the third equation will give z = 1/3, hence 
x = 1/3; putting this value for x into the second equation then gives y = -2/3. 

In this section we will show how to make this approach systematic, using row 
reduction. The big advantage of row reduction is that it requires no cleverness, 
as we will see in Theorem 2.1.8. It gives a recipe so simple that the dumbest 
computer can follow it. 

The first step is to write the system of Equation 2.1.1 in matrix form. We can 
write the coefficients as one matrix, the unknowns as a vector and the constants 
on the right as another vector: 



coefficient matrix (A) vector of unknowns (5?) constants (b) 


Our system of equations can thus be written as the matrix multiplication 
Ax = b: 
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The matrix A uses position to 
impart information, as do Arabic 
numbers; in both cases, 0 plays 
a crucial role as place holder. In 
the number 4 084, the two 4’s have 
very different meanings, as do the 
l’s in the matrix: the 1 in the first 
column is the coefficient of x , the 
l’s in the second column are the 
coefficients of y, and that in the 
third column is the coefficient of 
z. 

Using position to impart infor- 
mation allows for concision; in Ro- 
man numerals, 4084 is 

MMMMLXXXIIII. 

(To some extent, we use position 
when writing Roman numerals, as 
in IV = 4 and VI = 6, but the Ro- 
mans themselves were quite happy 
writing their numbers in any or- 
der, MMXXM for 3 020, for exam- 
ple.) 

The ith column of the matrix A 
corresponds to the ith unknown. 


2 

1 

2 


1 3 
-1 0 
0 1 


x 

y 

z 

1 

1 

1 


2 . 1.2 


We now use a shorthand notation, omitting the vector x, and writing A and b 
as a single matrix, with b the last column of the new matrix: 


2.1.3 



A 6 

More generally, we see that a system of equations 

<*1,1*1 H h <*i, n *n = 6] 


<*m, 1*1 + h <*m,n*n = 


2.1.4 


m 


is the same as Ax = b: 


The first subscript in a pair of 
subscripts refers to vertical posi- 
tion, and the second to horizontal 
position: ai, n is the coefficient for 
the top row, nth column: first take 
the elevator , then walk down the 
hall. 

The matrix [A,b] is shorthand 
for the equation Ax = b. 


<*1,1 “• <*l,n 


'*1 ■ 


rfci 


<*1,1 ' ' • <*l,n &1 

; • 


• 

• 

— 

• 

i.e.. 

• • • 

* * 


• 


• 


• • • 

. <*m,l * <*m,n _ 


• *n - 

✓ s 

• - 


„<*m,l ' ‘ ’ <*m,n _ 


. 2.1.5 


[AS] 


We denote by [A, b], with a comma, the matrix obtained by putting side-by- 
side the matrix A of coefficients and the vector b, as in the right-hand side of 
Equation 2.1.5. The comma is intended to avoid confusion with multiplication; 
we are not multiplying A and b. 


How would you write in matrix form the system of equations 


x + 3z = 2 
2x + y + z = 0 
2y + z = 1? 
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Check your answer below. 1 

Row operations 

We can solve a system of linear equations by row reducing the corresponding 
matrix, using row operations. 

Definition 2.1.1 (Row operations). A row operation on a matrix is one 
of three operations: 

(1) Multiplying a row by a nonzero number, 

(2) Adding a multiple of a row onto another row, 

(3) Exchanging two rows. 


Remark 2.1.2. We could just as 
well talk about column operations, 
substituting the word column for 
the word row in Definition 2.1.1. 
We will use column operations in 
Section 4.8. 


Exercise 2.1.3 asks you to show that the third operation is not necessary; 
one can exchange rows using operations (1) and (2). 

There are two good reasons why row operations are important. The first 
is that they require only arithmetic: addition, subtraction, multiplication and 
division. This is what computers do well; in some sense it is all they can do. 
And they spend a lot of time doing it: row operations are fundamental to most 
other mathematical algorithms. 

The other reason is that they will enable us to solve systems of linear equa- 
tions: 


Theorem 2.1.3. If the matrix [A, b] representing a system of linear equa- 
tions Ax = b can be turned intoJA 7 , b 7 J by a sequence of row operations, 
then the set of solutions of AH- b and set of solutions of A'H = b 7 coincide. 

Proof. Row operations consist of multiplying one equation by a nonzero num- 
ber, adding a multiple of one equation to another and exchanging two equations. 
Any solution of Ax = b is thus a solution of A'H = b 7 . In the other direction, 
any row operation can be undone by another row operation (Exercise 2.1.4), so 
any solution A'x = b 7 is also a solution of Ax = b. □ 


Theorem 2.1.3 suggests that we solve AH = b by using row operations to 
bring the system of equations to the most convenient form. In Example 2.1.4 
we apply this technique to Equation 2.1.1. For now, don't worry about how 
the row reduction was achieved; this will be discussed soon, in the proof of 
Theorem 2.1.8. Concentrate instead on what the row reduced matrix tells us 
about solutions to the system of equations. 


1 

[1 

0 

3 

2’ 


2 

1 

1 

0 


0 

2 

1 

1 



'1 0 3' 


X 

i.e., 

2 1 1 

0 2 1 

* m 


V 

z 

• m 
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We said not to worry about 
how we did the row reduction in 
Equation 2.1.7. But if you do 
worry, here are the steps: To get 
(1), divide Row 1 by 2, and add 
- 1 /2 Row 1 to Row 2, and sub- 
tract Row 1 from Row 3. To get 
from (1) to (2), multiply Row 2 by 
—2/3, and then add that result to 
Row 3. FYom (2) to (3), subtract 
half of Row 2 from Row 1. For (4), 
subtract Row 3 from Row 1. For 
(5), subtract Row 3 from Row 2. 



1 


1/2 


3/2 1/2 

:d 

0 

- 

- 3/2 

- 

- 3/2 1/2 


0 


-1 


-2 

0 


ri 

1/2 

3/2 1/21 

( 2 ) 

° 

1 


1 - 1/3 


lo 

0 

— 

■1 - 1 / 3 J 



1 

0 

1 

2/3 ‘ 


( 3 ) 

0 

1 

1 

- 1/3 




0 

0 

1 

1 / 3 . 




1 

0 

0 

1 / 3 ’ 


(4) 

0 

1 

1 

- 1/3 




.0 

0 

1 

1 / 3 . 



[ 

1 

0 

0 

1 / 3 ' 


(5) 

0 

1 

0 

- 2/3 

. 


l 

0 

0 

1 

1 / 3 . 



Echelon form is generally con- 
sidered best for solving systems of 
linear equations. (But it is not 
quite best for all purposes. See 
Exercise 2.1.9.) 


Example 2.1.4 (Solving a system of equations with row operations). 
To solve 


2x + y + 3z — 1 
x - y = 1 
2x + z = 1, 

we can use row operations to bring the matrix 


2 

1 

3 

r 


rl 

0 

0 

1/3 1 

1 

-1 

0 

1 

to the form 

0 

1 

0 

-2/3. 

2 

0 

1 

1 


LO 

0 

1 

1/3 J 


2.1.6 


2.1.7 


(To distinguish the new A and b from the old, we put a “tilde” on top: A, b.) In 
this case, the solution can just be read off the matrix. If we put the unknowns 
back in the matrix, we get 


x 0 0 1/3 

0 y 0 -2/3 
0 0 2 1/3 


x = 1/3 

or y — -2/3 A 
z — 1/3 


2.1.8 


Echelon form 

Of course some systems of linear equations may have no solutions, and others 
may have infinitely many. But if a system has solutions, they can be found by 
an appropriate sequence of row operations, called row reduction , bringing the 
matrix to echelon form , as in the second matrix of Equation 2.1.7. 


Definition 2.1.5 (Echelon form). A matrix is in echelon form if: 

(1) In every row, the first nonzero entry is 1, called a pivotal 1. 

(2) The pivotal 1 of a lower row is always to the right of the pivotal 1 of 
a higher row; 

(3) In every column that contains a pivotal 1, ail other entries are 0. 

(4) Any rows consisting entirely of 0’s are at the bottom. 


Clearly, the identity matrix is in echelon form. 

Example 2.1.6 (Matrices in echelon form). The following matrices are in 
echelon form; the pivotal l’s are underlined: 


"1 

0 

0 

3' 


"I 

1 

0 

o' 


’0 

1 

3 

0 

0 

3 

0 

■t 

-4 

0 

1 

0 

-2 

t 

0 

0 

1 

0 

J 

0 

0 

0 

1 

-2 

1 

0 

1 

0 

0 

1 

1 


0 

0 

0 

1 


0 

0 

0 

0 

0 

0 

I 

2 
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Row reduction to echelon form 
is really a systematic form of elim- 
ination of variables. The goal is to 
arrive, if possible, at a situation 
where each row of the row-reduced 
matrix corresponds to just one 
variable. Then, as in Equation 
2.1.8, the solution can be just be 
read off the matrix. 

Essentially every result in the 
first six .sections of this chapter is 
an elaboration of Theorem 2.1.8. 

In MaTLAB, the command 
rref (“row reduce echelon form") 
brings a matrix to echelon form. 

Once you’ve gotten the hang 
of row reduction you’ll see that it 
is perfectly simple (although we 
find it astonishingly easy to make 
mistakes). There’s no need to look 
for tricks; you just trudge through 
the calculations. 


Computers use algorithms that 
are somewhat faster than the one 
we have outlined. Exercise 2.1.9 
explores the computational cost of 
solving a system of n equations in 
n unknowns. Partial row reduc- 
tion with back-substitution, de- 
fined in the exercise, is roughly a 
third cheaper than full row reduc- 
tion. You may want to take short- 
cuts too; for example, if the first 
row of your matrix starts with a 
3, and the third row starts with 
a 1, you might want to make the 
third row the first one, rather than 
dividing through by 3. 


Example 2.1.7 Matrices not in echelon form). The following matrices 
are not in echelon form. Can you say why not? 2 


1 

0 

0 


0 0 
0 1 
1 0 


2 

1 

1 



1 1 0 r 


'0 0 0" 


> 

0020 

> 

1 0 0 



0001 


0 1 0 



0 

0 

0 


1 030-3 

0-111 1 
0 0 0 1 2 


Exercise 2.1.5 asks you to bring them to echelon form. 


How to row reduce a matrix 

The following result and its proof are absolutely fundamental: 


Theorem 2.1.8. Given any matrix A, there exists a unique matrix A in 
echelon form that can be obtained from A by row operations. 

Proof. The proof of this theorem is more important than the result: it is an 
explicit algorithm for computing A. Called row-reduction or Gaussian elimina- 
tion (or several other names), it is the main tool for solving linear equations. 

Row reduction: the algorithm. To bring a matrix to echelon form: 

(1) Look down the first column until you find a nonzero entry, called a pivot. 
If there is none, look down the second column, etc. 

(2) Put the row containing the pivot in the first row position, and then divide 
it by the pivot to make its first entry a pivotal 1, as defined above. 

(3) Add appropriate multiples of this row onto the other rows to cancel the 
entries in the first column of each of the other rows. 

Now look down the next column over, (and then the next column if necessary, 
etc.) starting beneath the row you just worked with, and look for a nonzero 
entry (the next pivot). As above, exchange its row with the second row, divide 
through, etc. 

This proves existence of a matrix in echelon form that can be obtained from 
a given matrix. Uniqueness is more subtle and will have to wait; it uses the 
notion of linear independence, and is proved in Exercise 2.4.10. □ 

Example 2.1.9 (Row reduction). Here we row reduce a matrix. The Ws 
refer in each case to the rows of the immediately preceding matrix. For example, 
the second row of the second matrix is labeled R\ + R 2 , because that row is 
obtained by adding the first and second rows of the preceding matrix. 

2 The first matrix violates rule (2); the second violates rules (1) and (3); the third 
violates rule (4), and the fourth violates rule (3). 
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'12 3 1] [12 3 1] [12 3 1' 

-110 2 -+£,+£2 0 3 3 3 ->R 2 / 3 0 111 

1 0 1 2J R 3 -Ri[0 -2 -2 lj [0 -2 -2 1_ 

Ri -2R 2 [I 0 1-1] [10 1 -1' 

0 1 1 1 -> 0111. 

/2 3 + 2H 2 [o 0 0 3j ft 3 /3 L° 0 0 1 

Note that in the fourth matrix we were unable to find a nonzero entry in the 
third column, third row, so we had to look in the next column over, where there 
is a 3. A 


Just as you should know how 
to add and multiply, you should 
know how to row reduce, but the 
goal is not to compete with a com- 
puter, or even a scientific calcula- 
tor; that’s a losing proposition. 


This is not a small issue. Com- 
puters spend most of their time 
solving linear equations by row re- 
duction. Keeping loss of precision 
due to round-off errors from get- 
ting out of hand is critical. En- 
tire professional journals are de- 
voted to this topic; at a university 
like Cornell perhaps half a dozen 
mathematicians and computer sci- 
entists spend their lives trying to 
understand it. 


Exercise 2.1.7 provides practice in row reducing matrices. It should serve also 
to convince you that it is indeed possible to bring any matrix to echelon form. 

When computers row reduce: avoiding loss of precision 

Matrices generated by computer operations often have entries that are really 
zero but are made nonzero by round-off error: for example, a number may be 
subtracted from a number that in theory is the same, but in practice is off by, 
say, Hr 50 , because it has been rounded off. Such an entry is a poor choice 
for a pivot, because you will need to divide its row through by it, and the row 
will then contain very large entries. When you then add multiples of that row 
onto another row, you will be committing the basic sin of computation : adding 
numbers of very different sizes, which leads to loss of precision. So, what do 
you do? You skip over that almost-zero entry and choose another pivot. There 
is, in fact, no reason to choose the first nonzero entry in a given column; in 
practice, when computers row reduce matrices, they always choose the largest. 


Example 2.1.10 (Thresholding to avoid round-off errors). If you are 
computing to 10 significant digits, then 1 + 10“ 10 = 1.0000000001 = 1. So 
consider the system of equations 


10“ 10 x + 2y = 1 
x + y = 1, 


2.1.9 


the solution of which is 


1 1 - 10" 10 

X ~ 2 — 10 -10 ’ V ~ 2 - 10 -10 ' 2110 

If you are computing to 10 significant digits, this is x = y = .5. If you actually 
use 10“ 10 as a pivot, the row reduction, to 10 significant digits, goes as follows: 
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10- 10 2 1 


1 2 • 10 10 10 10 ' 


1 

2 • 10 10 

10 10 ' 

1 1 1 

* 

11 1 

— ► 

0 

-2 • 10 l ° 

- 10 10 


L° I -5J 

The “solution” shown by the last matrix reads x = 0, which is badly wrong: x 
is supposed to be .5. Now do the row reduction treating 10 -10 as zero; what 
do you get? If you have trouble, check the answer in the footnote. 3 A 

Exercise 2.1.8 asks you to analyze precisely where the troublesome errors 
occurred. All computations have been carried out to 10 significant digits only. 


2.2 Solving Equations Using Row Reduction 


Recall (Equations 2.1.4 and 
2.1.5) that Ax = b represents a 
system of equations, the matrix A 
giving the coefficients, the vector x 
giving the unknowns (for example, 
for a system with three unknowns, 

r* l 


X = 


y 


), and the vector b con- 


V z j 

tains the solutions. The matrix 
[A, b] is shorthand for Ax = b. 


In this section we will see, in Theorem 2.2.4, what a row-reduced matrix 
representing a system of linear equations tells us about its solutions. To solve 
the system of linear equations Ax = b, form the matrix [A, b] and row reduce 

it to echelon form, giving (A,b). If the system has a unique solution, it can 
then be read off the matrix, as in Example 2.1.4. If it does not, the matrix will 
tell you whether there is no solution, or infinitely many solutions. Although 
the theorem is practically obvious, it is the backbone of the entire part of linear 
algebra that deals with linear equations, dimension, bases, rank, and so forth. 

Remark. In Theorem 2.1.8 we used the symbol tilde to denote the echelon 
form of a matrix: A is the echelon form of A, obtained by row reduction. Here, 

[A, b] represents the echelon form of the entire “augmented” matrix [A,b]: i.e., 

it is [A, b]. We use two tildes rather than one wide one because we need to talk 

about b independently of A. A 

In the matrix [A,b], the columns of A correspond in the obvious way to 
the unknowns Xi of the system Ax = b: the tth column corresponds to the 
ith unknown. In Theorem 2.2.4 we will want to distinguish between those 
unknowns corresponding to pivotal columns and those corresponding to non- 
pivotal columns. 


Definition 2.2.1 (Pivotal column). A pivotal column of A is a column 
of A such that the corresponding column of A contains a pivotal 1. 


3 ■ j 

Remember to put the second row in the first row position: 


<N 

0 

1 

O 

l l-f' 

) 2 1 

Ll [l 1 1 

L 1 f± 1 

ll _ fl 

0 

.5] 

Li ii 

J l 

L 1 ] 

LJ —* [o 2 1 

LJ [o 1 .J 

sj [o 

1 

4 
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The terms “pivotal” and “non- 
pivotal” do not describe some in- 
trinsic quality of a particular un- 
known. If a system of equations 
has both pivotal and non-pivotal 
unknowns, which are pivotal and 
which are not may depend on the 
order in which you order the un- 
knowns. as illustrated by Exercise 
2 . 2 . 1 . 


A non-pivotal column is a column of A such that the corresponding column 
of A does not contain a pivotal 1. 

Definition 2.2.2 (Pivotal unknown). A pivotal unknown (or pivotal 
variable) of a system of linear equations Ax = b is an unknown corresponding 
to a pivotal column of A: rr, is a pivotal unknown if the ith column of A 
contains a pivotal 1. A non-pivotal unknown corresponds to a non-pivotal 
column of A: Xj is a non-pivotal unknown if the jth column of A does not 
contain a pivotal 1. 


Example 2.2.3 (Pivotal and non-pivotal unknowns). The matrix 


The row reduction in Example 
2.2.3 is unusually simple in the 
sense that it involves no fractions; 
this is the exception rather than 
the rule. Don’t be alarmed if your 
calculations look a lot messier. 


In Example 2.2.3 the non- 
pivotal unknown z corresponds to 
the third entry of x; the system of 
equations 


2x + y ~f~ 3 z — 1 
x - y = 1 
x + y + 2z = 1 

corresponds to the multiplication 


’2 1 3 ‘ 

1 -1 0 

1 1 2 


x 

y 

z _ 
1 ! 
1 
1 


1 

1 

1 


I 0 1 O' 
0 110 
0 0 0 1 


Mb] 

so x and y are pivotal unknowns, and z is a non-pivotal unknown. A 
Here is what Theorems 2.1.3 and 2.1.8 do for us: 

Theorem 2.2.4 (Solutions to linear equations). Represent the system 

Ax = b, involving m linear equations in n unknowns, by the m x (n + l) 

matrix [A,b], which row reduces to [A,b]. Then 

(1) If the row-reduced vector b contains a pivotal 1, the system has no 
solutions. 

(2) If b does not contain a pivotal 1, then: 

(a) if there are no non-pivotal unknowns (i.e., each column of A 
contains a pivotal 1), the system has a unique solution; 

(b) if at least one unknown is non-pivotal, there are infinitely many 
solutions ; you can choose freely the values of the non-pivotal 
unknowns, and these values will determine the values of the 
pivotal unknowns. 


[A, b) = 


2 1 

1 -1 

1 I 


3 

0 

2 


corresponding to the system of equations 
2x + y + 3z = 1 

x — y = 1 row reduces to 
x + v + 2z as 1 


There is one case where this is of such importance that we isolate it as a 
separate theorem, even though it is a special case of part (2a). 
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Theorem 2.2.5. A system A5L = b has a unique solution for every b if and 
only if A row reduces to the identity (For this to occur , there must be as 
many equations as unknowns , i.e., A must be square.) 


The nonlinear versions of these 
two theorems are the inverse func- 
tion theorem and the implicit 
function theorem, discussed in 
Section 2.9. In the nonlinear case, 
we define the pivotal and non- 
pivotal unknowns as being those 
of the linearized problems; as in 
the linear case, the pivotal un- 
knowns are implicit functions of 
the non-pivotal unknowns. But 
those implicit functions will be de- 
fined only in a small region, and 
which variables are pivotal and 
which are not depends on where 
we compute our linearization. 


We will prove Theorem 2.2.4 after looking at some examples. Let us consider 
the case where the results are most intuitive, where n = m. The case where the 
system of equations has a unique solution is illustrated by Example 2.1.4. The 
other two — no solution and infinitely many solutions — are illustrated below. 

Example 2.2.6 (A system with no solutions). Let us solve 

2x + y + 3z = 1 

x - y — 1 2.2.1 

x + y + 2z = 1. 

The matrix 


'2 

1 

3 

f 


'1 

0 

1 

0 * 

1 

-1 

0 

1 

row reduces to 

0 

1 

1 

0 

1 

1 

2 

1 


0 

0 

0 

I 


Note, as illustrated by Equa- 
tion 2.2.2, that if b (i.e., the last 
column in the row-reduced matrix 
(A,b]) contains a pivotal 1, then 
necessarily all the entries to the 
left of the pivoted 1 are zero, by 
definition. 


In this case, the solutions form 
a family that depends on the single 
non-pivotal variable, z\ A has one 
column that does not contain a 
pivotal 1. 


so the equations are incompatible and there are no solutions; the last row tells 
us that 0=1. A 


Example 2.2.T (A system with infinitely many solutions). Let us solve 


2x + y + 3z = 1 
x-y =1 
x + y + 2z = 1/3. 

The matrix 


'2 

1 

3 

1 


I 

0 

1 

2 / 3 " 

1 - 

1 

0 

1 

row reduces to 

0 

I 

1 

- 1/3 

1 

1 

2 

1/3 . 


0 

0 

0 

0 


2.2.3 


2.2.4 


The first row of the matrix says that x 4- z = 2/3; the second that y 4- z — 
-1/3. You can choose z arbitrarily, giving the solutions 


2/3 -z 
-1/3 -z 
z 


2.2.5 


there are as many solutions as there are possible values of z — an infinite number. 
In this system of equations, the third equation provides no new information; it 
is a consequence of the first two. If we denote the three equations R \ , R 2 and 
Rz respectively, then R$ = 1/3 (2R\ ~ R 2 ): 
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2R\ 4x + 2y + 62 = 2 
-R 2 -x + y =-l 

2R\ - /* 2 = SRz 3x + 3y + 6z= 1. A 

In the examples we have seen so far, b was a vector with numbers as entries. 
What if its entries are symbolic? Depending on the values of the symbols, 
different cases of Theorem 2.2.4 may apply. 


Example 2.2.8 (Equations with symbolic coefficients). Suppose we want 
to know what solutions, if any, exist for the system of equations 


If we had arranged the columns 
differently, a different variable 
would be non- pivotal; the four 
variables here play completely 
symmetrical roles. 


X\ + x 2 = Oj 

X2 + Xz = 02 

£3 + x\ — az 

X4 + Xi — CL4. 

Row operations bring the matrix 


2.2.6 


[A, b] 


1 

0 

0 


0 

I 

0 


1 

1 

0 


01 

0 

1 

b 


‘1 

1 

0 

0 

<*r 


1 

0 

0 

1 

fll + <13 “ °2 

0 

1 

1 

0 

o 2 

to 

0 

1 

0 

-1 

0-2 ~ 03 

0 

0 

1 

1 

03 

0 

0 

1 

1 

az 

.1 

0 

0 

1 

04- 


.0 

0 

0 

0 

02 + a 4 “ °1 ~ a z . 


Figure 2.2.1. 

Case 1: No solution.^ 

The row-reduced column b con- 
tains a pivotal 1; the third line 
reads 0=1. (The left-hand side 
of that line must contain all 0’s; 
if the third entry were not 0, it 
would be a pivotal 1, and then b 
would contain no pivotal 1.) 


(A b] = 


1 0 0 61 
0 1 0 62 
0 0 1 63 
0 0 0 0 


Figure 2.2.2. 

Case 2a: Unique solution. 
Each column of A contains a piv- 
otal 1, giving 

X\ — 61; X 2 — & 2 ; X3 = 63. 


2.2.7 


so a first thing to notice is that there are no solutions if <12 + a 4 — °i ~ a 3 ^ 0: we 
are then in case (1) of Theorem 2.2.4. Solutions exist only if 02+04-01-03 = 0. 
If that condition is met, we are in case (2b) of Theorem 2.2.4: there is no pivotal 
1 in the last column, so the system has infinitely many solutions, depending 
on the value of the single non- pivotal variable, x 4 , corresponding to the fourth 
column. A 

Proof of Theorem 2.2.4. Case (1). If the row-reduced vector b contains a 
pivotal 1, the system has no solutions. 

Proof: The set of solutions of Ax = b is the same as that of^Ax = b by 

Theorem 2.1.3. If bj is a pivotal 1, then the jth equation of Ax = b reads 0 = 1 
(as illustrated by the matrix in Figure 2.2.1), so the system is inconsistent. 

^ *"w 

Case (2a). If b does not contain a pivotal 1, and each column of A contains 
a pivotal 1, the system has a unique solution. 

Proof: This occurs only if there are at least as many equations as unknowns 
(there may be more, as shown in Figure 2.2.2). If each column of A contains 

a pivotal 1, and b has no pivotal 1, then for each variable x x there is a unique 
solution Xi = 6 t ; all other entries in the ith row will be 0, by the rules of row 
reduction. If there are more equations than unknowns, the extra equations 
do not make the system incompatible, since by the rules of row reduction, 
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[1 0 -1 &r 

[A, b) = 0 1 2 bi 

0 0 0 0 

[1 0 3 0 2 6i' 

|£,b] = 0 1 1 0 0 &2 

,0 0 0 1 1 6 3 . 

Figure 2.2.3. 

Case 2b: Infinitely many solu- 
tions (one for each value of non- 
pivotal variables). 


the corresponding rows will contain all 0’s, giving the correct if uninformative 
equation 0 = 0. 

Case (2b) If b does not contain a pivotal 1, and at least one column of A 
contains no pivotal 1, there are infinitely many solutions: you can choose freely 
the values of the non-pivotal unknowns, and these values will determine the 
values of the pivotal unknowns. 

Proof: A pivotal 1 in the ith column corresponds to the pivotal variable x t . 
The row containing this pivotal 1 (which is often the ith row but may not be, 
as shown in Figure 2.2.3, matrix B) contains no other pivotal l’s: all other non- 
zero entries in that row correspond to non-pivotal unknowns. (For example, in 
the row-reduced matrix A of Figure 2.2.3, the -1 in the first row, and the 2 in 
the second row, both correspond to the non-pivotal variable x 3 .) 

Thus if there is a pivotal 1 in the jth row, corresponding to the pivotal 
unknown x,, then x t equals hj minus the sum of the products of the non-pivotal 
unknowns x k and their (row-reduced) coefficients in the jth row: 

Xi — bj ^ ^ Gj k x k D 2.2.8 

sum of products of the 
non-pivotal unknowns in 
jth row and their coefficients 

For the matrix A of Figure 2.2.3 we get 

a# 

X\ = + x s and X2 = £>2 ~ 2 t 3 ; 

we can make X3 equal anything we like; our choice will determine the values of 
the pivotaWariables x\ and X2- What are the equations for the pivotal variables 
of matrix B in Figure 2.2. 3? 4 

How many equations in how many unknowns? 

In most cases, the outcomes given by Theorem 2.2.4 can be predicted by con- 
sidering how many equations you have for how many unknowns. If you have n 
equations for 1 % unknowns, most often there will be a unique solution. In terms 
of row reduction, A will be square, andjnost often row reduction will result in 
every row of A having a pivotal 1; i.e., A will be the identity. This is not always 
the case, however, as we saw in Examples 2.2.6 and 2.2.7. 

4 The pivotal variables xi,xa and X4 depend on our choice of values for the non- 
pivotal variables X3 and x s : 

Xl = 61 — 3 X 3 — 2X5 
xa = &2 — X 3 

X4 = 63 — X$. 
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Figure 2.2.4. 

Top: Two lines meet in a sin- 
gle point, representing the unique 
solution to two equations in two 
unknowns. Middle: A case where 
two equations in two unknowns 
have no solution. Bottom: Two 
lines are colinear, representing a 
case where two equations in two 
unknowns have infinitely many so- 
lutions. 


If you have more equations than unknowns, as in Exercise 2.1.7(b), you would 
expect there to be no solutions; only in very special cases can rc — 1 unknowns 
satisfy n equations. In terms of row reduction, in this case A will have more 
rows than columns, and at least one row of A will not have a pivotal 1. A row 

of A without a pivotal 1 will consist of 0’s; if the adjacent entry of b is non-zero 
(as is likely), then the solution will have no solutions. 

If you have fewer equations than unknowns, as in Exercise 2.2.2(e), you would 
expect infinitely many solutions. In terms of row reduction, A will have fewer 
rows than columns, so at least one column of A will contain no pivotal 1: there 

will be at least one non-pivotal unknown. In most cases, b will not contain a 
pivotal 1. (If it does, then that pivotal 1 is preceded by a row of 0’s.) 

Geometric interpretation of solutions 

These examples have a geometric interpretation. The top graph in Figure 2.2.4 
shows the case where two equations in two unknowns have a unique solution. 
As you surely know, two equations in two unknowns, 

aix + b\y = ci 

, . 2.2.9 

O’ 2* + 02y = C2, 

are incompatible if and only if the lines t\ and t 2 in R 2 with equations a x x + 
b\y = C! and a^x + 62J/ = c 2 are parallel (middle graph. Figure 2.2.4). The 
equations have infinitely many solutions if and only if ti = i 2 (bottom graph, 
Figure 2.2.4). 

When you have three equations in three unknowns, each equation describes 
a plane in R 3 . The top graph of Figure 2.2.5 shows three planes meeting in a 

single point, the case where three equations in three unknowns have a unique 
solution. 

There are two ways for the equations in R 3 to be incompatible, which means 
that the planes do not intersect. One way is that two of the planes are parallel, 
but this is not the only, or even the usual way: they will also be incompatible 
if no two are parallel, but the line of intersection of any two is parallel to the 

third, as shown by the middle graph of Figure 2.2.5. This latter possibility 
occurs in Example 2.2.6. 

There are also two ways for equations in R 3 to have infinitely many solutions. 
The three planes may coincide, but again this is not necessary or usual The 
equations will also have infinitely many solutions if the planes intersect in a 
common line, as shown by the bottom graph of Figure 2.2.5. (This second 
possibility occurs in Example 2.2.7.) 
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Solving several systems of linear equations with one matrix 

Theorem 2.2.4 has an additional spinoff. If you want to solve several systems 
of n linear equations in n unknowns that have the same matrix of coefficients, 
you can deal with them all at once, using row reduction. This will be useful 
when we compute inverses of matrices in Section 2.3. 

Corollary 2.2.9 (Solving several systems of equations simultane- 
ously). Several systems of n linear equations in n unknowns, with the same 

coefficients (e.g. .4x = b, Ax = can be solved at once with row 

reduction. Form the matrix 

[A, bj, . . . , 6*] smd row reduce it to get [A, bj, . . . , b*|. 

If A is the identity, then b, is the solution to the ith equation i4x = bj. 

If .4 row reduces to the identity, the row reduction is completed by the time 
one has dealt with the last row of A. The row operations needed to turn A into 
A affect each b,, but the b, do not affect each other. 

Example 2.2.10 (Several systems of equations solved simultaneously). 

Suppose we want to solve the three systems 

2i + y + 3c = 1 2 t + y + 3z = 2 2x -f j/ + 3z = 0 

(1) r — y + 2=1 (2) jr-i/4- 2 = 0 (3) x - y + z — \ 

x + y -f 2z = 1 . x 4- y + 2z = 1. x + y 4- 22 = 1. 

We form the matrix 

'213120] [100-22 -5 

111 101. which row reduces to 0 1 0 -1 1 —2 

112111 001 2-14 

A b; bj b.i / b j bj b 3 


y = 


f icjIire 2.2..). xhe solution to the First system of equations is — 1 , i.e. y — -1 

Top>: Three equations in three 2 2 2 

unknowns meet in a single point. [”21 L J '"51 

representing the unique solution 

to threi* equations in thre<- .in solution to the second is 1 : the sohit ion to the third is -2 A 

knowns. Middle: Three equations . ~ \ 

in three unknowns have no solu- 

,ion Bo ' 1 ' T ' ,r<!0 «i u «- 2.3 Matrix Inverses and Elementary Matrices 

tions in tlin**' unknowns 


have infinitely many solutions. 


In this section we will see that matrix inverses give another way to solve equa- 
tions. We will also introduce the modern view’ of row reduction: that a row’ 
operation is equivalent to multiplying a matrix bv an elementary matrix. 


Only square matrices can have 
inverses: Exercise 2.3.1 asks you 
to (1) derive this from Theorem 
2.2.4, and (2) show an example 
where AB = /, but BA ^ /. 
Such a B would be only a “one- 
sided inverse” for A, not a real 
inverse; a “one-sided inverse” can 
give uniqueness or existence of so- 
lutions to Ax = b, but not both. 


To construct the matrix [A\I\ of 
Theorem 2.3.2, you put A to the 
left of the corresponding identity 
matrix. By “corresponding" we 
mean that if A is n x n, then the 
identity matrix I must be n x n. 
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Solving equations with matrix inverses 

Recall from Section 1.2 that the inverse of a matrix A is another matrix A 
such that AA~ l = A' 1 A = J, the identity. In that section we discussed two 
results involving inverses, Propositions 1.2.14 and 1.2.15. The first says that if 
a matrix has both a left and a right inverse, then those inverses are identical. 
The second says that the product of two invertible matrices is invertible, and 
that the inverse of the product is the product of the inverses, in reverse order. 

Inverses give another way to solve equations. If a matrix A has an inverse 
A -1 , then for any b the equation Ax = b has a unique solution, namely x = 
A- 1 b. 

One can verify that A -1 b is a solution by plugging it into the equation 
Ax = b: 

A(A -1 b) = (AA~ l ) b - /b = b. 2.3.1 

This makes use of the associativity of matrix multiplication. 

The following computation proves uniqueness: 

Ax = b, so A~ Ax = A~ b; since A Ax = x, we have x — A~ b. 

2.3.2 

Again we use the associativity of matrix multiplication. Note that in Equation 
2.3.1 the inverse of A is on the right.; in Equation 2.3.2 it is on the left. 

The above argument, plus Theorem 2.2.5, proves the following proposition. 

Proposition 2.3.1. A matrix A is invertible if and only if it row reduces to 
the identity. 

In particular, to be invertible a matrix must be square. 

Computing matrix inverses 

Computing matrix inverses is rarely a good way to solve linear equations, but 
it is nevertheless a very important construction. Equation 1.2.15 show's how- 
to compute the inverse of a 2 x 2 matrix. Analogous formulas exist for larger 
matrices, but they rapidly get out of hand. The effective way to compute mat rix 
inverses for larger matrices is by row reduction: 

Theorem 2.3.2 (Computing a matrix inverse). If A is a n x n matrix, 
and you construct the nx2n augmented matrix [A\I] and row reduce it, then 
either: 

(1) The first n columns row reduce to the identity, in which case the last 
n columns of the row-reduced matrix are the inverse of A, or 

(2) The first n columns do not row reduce to the identity, in which case 
A does not have an inverse. 
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Example 2.3.3 (Computing a matrix inverse). 



2 

1 

3 ' 


3 

-1 

- 4 * 

A = 

1 

-1 

1 

has inverse A 1 — 

1 

-1 

-1 


1 

1 

2 


-2 

1 

3 


because 


'2 

1 

3 

1 

0 

0“ 


'1 

0 

0 

3 

-1 

—4 

1 

-1 

1 

0 

1 

0 

row reduces to 

0 

1 

0 

1 

-1 

-1 

1 

1 

2 

0 

0 

1 


0 

0 

1 

-2 

1 

3 


Exercise 2 . 3.3 asks you to confirm that you can use this inverse matrix to 
solve the system of Example 2 . 2 . 10 . A 


We haven’t row reduced the 
matrix to echelon form; as soon 
we see that the first three columns 
are not the identity matrix, there’s 
no point in continuing; we already 
know that A has no inverse. 


Example 2*3.4 (A matrix with no inverse). Consider the matrix of Ex- 
amples 2.2.6 and 2.2.7, for two systems of linear equations, neither of which has 


a unique solution: 



1 3 

-1 0 
1 2 


2 . 3.5 


This matrix has no inverse A 1 because 


‘2 

1 

3 

1 

0 

O' 


‘I 

0 

1 

1 

0 

-f 

1 

-1 

0 

0 

1 

0 

row reduces to 

0 

I 

1 

-1 

0 

2 

1 

1 

2 

0 

0 

1 


0 

0 

0 

-2 

1 

3 


Proof of Theorem 2 . 3 . 2 . Suppose [A| 7 ] row reduces to [I\B]. Since A row 
reduced to the identity, the ith column of B is the solution x, to the equation 
A5ti = 

This uses Corollary 2 . 2 . 9 . In Example 2 . 2.10 illustrating that corollary, AB 

row reduced to IB , so the ith column of B (i.e., b*) is the solution to the 
equation Ax, = b*. We repeat the row reduction of that example here: 


'2 1 

3 

1 2 O' 


'l 

0 

0 

-2 

2 

- 5 ' 

1 -1 

1 

1 0 1 

row reduces to 

0 

1 

0 

-1 

1 

-2 

1 1 

2 

1 1 l . 


0 

0 

1 

2 

-1 

4 _ 

A 


bj b2 b3 



/ 



63 

63 


so Abi = bj. 

Similarly, when A/ row reduces to IB , the ith column of B (i.e., b,) is the 
solution to the equation Ax* = e*: 


f2 

1 

3 

1 

0 

o' 


‘1 

0 

0 

3 

-1 

- 4 ' 

1 -1 

1 

0 

1 

0 

row reduces to 

0 

1 

0 

1 

-1 

-1 

1 

1 

2 

0 

0 

1 


0 

0 

1 

-2 

1 

3 

A 



«2 S3 



/ 


61 

62 

63 


, 2 . 3.7 
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i 

rl 0 ... 0 ... 0 

0 1 ... 0 ... 0 


0 0 ... x 


0 


0 0... 0 ... lJ 

Type 1: E\(i,x) 


10 0 0 
0 10 0 
0 0 2 0 
0 0 0 1 

m 

Example type 1: Ei(3, 2) 


Recall (Figure 1.2.3) that the 
ith row of Ei A depends on all the 
entries of A but only the ith row 
of Ei. 


ri 


* 3 

0 ... 0 ... 0 


0 


x 


0 


0 ... 0 


0 


0 ... 0 ... 0 ... lJ 

Type 2: E 2 (i y j,x) 


1 0 -3 
0 1 0 
0 0 1 

m 

Example type 2: £ 2 ( 1 , 3, -3) 


so Ab* = e,. So we have: 

A [bi , b2 1 ■ • • bjj] [e j , 62? • • • ®n] • 2.3.8 

' ' ' — 

B J 

This tells us that B is a right inverse of A: that AB = /. 

We already know by Proposition 2.3.1 that if A row reduces to the identity it 
is invertible, so by Proposition 1.2.14, B is also a left inverse, hence the inverse 
of A. (At the end of this section we give a slightly different proof in terms of 
elementary matrices.) □ 

Elementary matrices 

After introducing matrix multiplication in Chapter 1, we may appear to have 
dropped it. We haven’t really. The modern view of row reduction is that any 
row operation can be performed on a matrix by multiplying A on the left by an 
elementary matrix. Elementary matrices will simplify a number of arguments 
further on in the book. 

There are three types of elementary matrices, all square , corresponding to the 
three kinds of row operations. They are defined in terms of the main diagonal, 
from top left to bottom right. We refer to them as “type 1,” “type 2,” and “type 
3,” but there is no standard numbering; we have listed them in the same order 
that we listed the corresponding row operations in Definition 2.1.1. 

Definition 2.3.5 (Elementary matrices). 

(1) The type 1 elementary matrix Ei (i, x) is the square matrix where every 
entry on the main diagonal is 1 except for the (i, i’)th entry, which is x £ 0, 
and in which all other entries are zero. 

(2) The type 2 elementary matrix E^ihjyx), for i ^ j, is the matrix where 
all the entries on the main diagonal are 1, and all other entries are 0 except 
for the (i,j)th, which is x. (Remember that the first index, i, refers to which 
row, and the second, j , refers to which column. While the (1, jf)th entry is x, 
the (j,*)th entry is 0.) 

(3) The type 3 elementary matrix i ^ j, is the matrix where the 

entries i, j and j y i are 1, as are all entries on the main diagonal except t,i 
and j,j, which are 0. All the others are 0. 

• Multiplying A on the left by Ei multiplies the ith row of A by x: EiAvs 
identical to A except that every entry of the ith row has been multiplied by x. 

• Multiplying A on the left by adds (x times the jth row) to the ith row. 

• matrix E$A is the matrix A with the ith and the jth rows exchanged. 
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The type 3 elementary matrix 
£ 3 (i,j) is shown at right. It is the 
matrix where the entries i.j and 
j, i are 1 , as are all entries on the 
main diagonal except i.i and j,j. 
which are 0 . All the others are 0 . 


i j 
fl ... 0 ... 0 


01 


0 

0 

0 

0 

0 


0 ... 1 .. 

0 10 .. 
1 ... 0 .. 

0 0 0 1 


0 ... 0 ... 


Type 3: E 3 (i, j) 


0 

0 

0 

0 



0 0 10 0 
0 10 0 0 
0 0 0 1 0 
0 0 0 0 1 


Example type 3 
£ 3 ( 2 , 3 ) 


Example 2.3.6 (Multiplication by an elementary matrix). We can 
multiply by 2 the third row of the matrix A, by multiplying it on the left by 
the type 1 elementary matrix E\ (3, 2) shown at left. A 


fl 0 0 0 
0 10 0 
0 0 2 0 
0 0 0 1 



elementary matrix 


Multiplying A by this elementary 
matrix multiplies the third row of 
A by 2. 


Exercise 2.3.8 asks you to confirm that multiplying a matrix A by the other 
types of elementary matrices is equivalent to performing the corresponding row 
operation. Exercise 2.1.3 asked you to show that it is possible to exchange rows 
using only the first two row operations. Exercise 2.3.14 asks you to show this 
in terms of elementary matrices. Exercise 2.3.12 asks you to check that column 
operations can be achieved by multiplication on the right by an elementary 
matrix of types 1,2, and 3 respectively. 

Elementary matrices are invertible 

One very important property of elementary matrices is that they are invertible , 
and that their inverses are also elementary matrices. This is another way of 
saying that any row operation can be undone by another elementary operation. 
It follows from Proposition 1.2.15 that any product of elementary matrices is 
also invertible. 

Proposition 2.3.T. Any elementary matrix is invertible. More precisely, 


The proof is left as Exercise 
2.3.13. 


(1) (Ei(i, x)) 1 = E 1 (i, i); the inverse is formed by replacing the x in 
the ( i , i)th position by l/x. This undoes multiplication of the ith row 
byx. 

(2) (£ 2 ( 1 , j,x)) 1 = Ea(i,j, - x ): the inverse is formed by replacing the 
x in the (i,j)th position by -x. This subtracts x times the jth row 
from the ith row. 

(3) (E 3 (i,j))~ ] = E 3 (i,j): multiplication by the inverse exchanges rows 
i and j a second time , undoing the first change. 
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Of course in ordinary arith- 
metic you can’t conclude from 4 x 
6 = 3x8 that 4 = 3 and 6 — 
8, but in matrix multiplication if 
[E)[A\I\ = [/|B], then [E][A] = I 
and [JE7] [J] = B , since the multipli- 
cation of each column of A and of 
I by [E] occurs independently of 
all other columns: 

im 

[E] [EA\EI). 


Equation 2.3.10 shows that 
when row reducing a matrix of the 
form [A|/], the right-hand side of 
that augmented matrix serves to 
keep track of the row operations 
needed to reduce the matrix to 
echelon form; at the end of the 
procedure, I has been row reduced 
to Ek . • ■ E\ = B, which is pre- 
cisely (in elementary matrix form) 
the series of row operations used. 


2 . 


Proving Theorem 2.3.2 with elementary matrices 

We can now give a slightly different proof of Theorem 2.3.2 using elementary 
matrices. 

(1) Suppose that [A\I] row reduces to (7|B). This can be expressed as mul- 
tiplication on the left by elementary matrices: 

E* . . . E\[A\I) = [I\B\. 2.3.9 

The left and right halves of Equation 2.3.9 give 

Ek...E\A — I and E*...Ei/ = E. 2.3.10 

Thus B is a product of elementary matrices, which are invertible, so (by Propo- 
sition 1.2.15) B is invertible: E -1 = Ef 1 . . . E* 1 . Moreover, substituting the 
right equation of 2.3.10 into the left equation gives BA = /, so B is a left 
inverse of A. We don’t need to check that it is also a right inverse, but doing so 
is straightforward: multiplying BA — I by B~ l on the left and B on the right 
gives 

I = B~ l IB = B~'(BA)B = (B-'B)AB = AB. 2.3.11 

So B is also a right inverse of A. 

(2) If row reducing [A|/] row reduces to [A'|A"], where A' is not the identity, 
then (by Theorem 2.2.5), the equation Ax* = e* either has no solution or has 
infinitely many solutions for each i = 1 ,. ..n. In either case, A is noninvert- 
ible. □ 

4 Linear Combinations, Span, and Linear 
Independence 

In 1750, questioning the general assumption that every system of n linear 
equations in n unknowns has a unique solution , the great mathematician 
Leonhard Euler pointed out the case of the two equations Zx — 2y = 5 and 
4 y = 6x — 10. “We will see that it is not possible to determine the two 
unknowns x and y,” he wrote , “since when one is eliminated, the other 
disappears by itself, and we are left with an identity from which we can 
determine nothing. The reason for this accident is immediately obvious , 
since the second equation can be changed to 6x - 4y — 10, which, being 
just the double of the first, is in no way different from it. ” 

Euler concluded by noting that when claiming that n equations are sufficient 
to determine n unknowns, w one must add the restriction that all the equations 
be different from each other, and that none of them is included in the others.” 
Euler s descriptive and qualitative approach” represented the beginning of a 
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More generally, these ideas ap- 
ply in all linear settings, such as 
function spaces and integral and 
differential equations. Any time 
the notion of linear combination 
makes sense one can talk about 
linear independence, span, ker- 
nels, and so forth. 


Example 2.4.1 (Linear combi- 

3 

nation). The vector is a lin- 

ear combination of the standard 
basis vectors ei and e 2 , since 


= 3 


+ 4 


But the vector 
combination of the vectors 

. M 

ei = 


is not a linear 


V 


0‘ 

0 

and e 2 = 

1 

0 


0 


new way of thinking. 5 At the time, mathematicians were interested in solving 
individual systems of equations, not in analyzing them. Even Euler began his 
argument by pointing out that attempts to solve the system fail; only then did 
he explain this failure by the obvious fact that 3x — 2y = 5 and 4 y = 6x — 10 
are really the same equation. 

Today, linear algebra provides a systematic approach to both analyzing and 
solving systems of linear equations, which was completely unknown in Euler’s 
time. We have already seen something of its power. How reduction to echelon 
form puts a system of linear equations in a form where it is easy to analyze. 
Theorem 2.2.4 then tells us how to read that matrix, to find out whether the 
system has no solution, infinitely many solutions, or a unique solution (and, in 
the latter case, what it is). 

Now we will introduce vocabulary that describes concepts implicit in what 
what we have done so far. The notions linear combinations , span and linear 
independence give a precise way to answer the questions, given a collection of 
linear equations, how many genuinely different equations do we have? How 
many can be derived from the others? 

Definition 2.4.2 (Linear combinations) . If v 1} . . . , is a collection of 
vectors in R n , then a linear combination of the ^ is a vector # of the form 

k 

w = 2.4.1 

»=i 

for any scalars a*. 

In other words, the vector w is the sum of the vectors Vj,... , each 
multiplied by a coefficient. 

The notion of span is a way of talking about the existence of solutions to 
linear equations. 

Definition 2.4.3 (Span). The span of ^ 1 , . . . , is the set of linear com- 
binations aiVi + h It is denoted Sp (^ 1 , . . . , V k ). 

The word span is also used as a verb. For instance, the standard basis vectors 
®i and e 2 span 1R 2 but not K 3 . They span the plane, because any vector in the 
plane is a linear combination a\ e t 4 - a 2 e 2 . 

Geometrically, this means that any point in the x , y plane can be written in 
terms of its x and y coordinates. The vectors u and v shown in Figure 2.4.1 
also span the plane. 

5 Jean-Luc Dorier, ed., L’Enseignement de I’alg&bre liniaire en question , La Pens6e 
Sauvage, Editions, 1997. Euler’s description, which we have roughly translated, 
is from “Sur une Contradiction Apparente dans la Doctrine des Lignes Courbes,” 
Memoires de VAcademie des Sciences de Berlin 4 (1750). 
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You are asked to show in Exercise 2.4.1 that Sp (vi , . . . , v*) is a subspace of 
R n and is the smallest subspace containing vi, . . . , v*. 


Examples 2.4.4 (Span: two easy cases). In simple cases it is possible to 
see immediately whether a given vector is in the span of a set of vectors. 



Figure 2.4.1. 

The vectors u and v span the 
plane: any vector, such as a, can 
be expressed as the sum of com- 
ponents in the directions d and v, 
i.e., multiples of u and v. 


(1) Is the vector il = 


in the span of w = 


2 

0 

1 


? Clearly not; no 


multiple of 0 will give the 1 in the second position of u. 
(2) Given the vectors 



* 1- 


■-2’ 


‘ 1 - 


*o- 


0 


-1 


1 


0 


0 

t *2 = 

1 

, *3 = 

-1 

, v 4 = 

1 


.- 1 . 


. 0. 


. 1 . 


.0. 


is v 4 in the span of {vi, V2tV 3 }? Check your answer below. 6 


2.4.2 


Example 2.4.5 (Row reducing to check span). When it’s not immediately 
obvious whether a vector is in the span of other vectors, row reduction gives 
the answer. Given the vectors 



*2* 


r 


'3' 


'3' 

Wi = 

1 

1 

• — 

, * 2 = 

-1 

1 

11 

CO 

0 

2 

m «■ 

, v = 

3 

1 


is v in the span of the other three? Here the answer is not apparent, so we 
can take a more systematic approach. If v is in the span of {wi, W2, W3}, then 
X1W1 +X2W2 + x 3 w 3 = v; i.e., (writing w x , w 2 and w 3 in terms of their entries) 
there is a solution to the set of equations 


2xi +X2 4- 3x3 = 3 


Like the word onto, the word 
span is a way to talk about the 
existence of solutions. 


We used Matlab to row reduce 
the matrix in Equation 2.4.5, as 
we don’t enjoy row reduction and 
tend to make mistakes. 


X1—X2 =3 2.4.4 

Xj +x 2 + 2x 3 = 1. 

Theorem 2.2.4 tells us how to solve this system; we make a matrix and row 
reduce it: 


' 2 

1 

3 

3' 


'l 

0 

1 

2 

1 

-1 

0 

3 

row reduces to 

0 

1 

1 

-1 

1 

1 

2 

1 


0 

0 

0 

0 


6 No. It is impossible to write V4 as a linear combination of the other three vectors. 
Since the second and third entries of vi are 0, if V 4 were in the span of {^1,^2, V3}, 
its second and third entries would depend only on $2 and V3. To achieve the 0 of the 
second position, we must give equal weights to $2 and V3, but then we would also 
have a 0 in the third position, whereas we need a 1. 



168 Chapter 2. Solving Equations 


Since the last column of the row reduced matrix contains no pivotal 1 , there 
is a solution: v is in the span of the other three vectors. But the solution 
is not unique: A has a column with no pivotal 1, so there are infinitely many 
ways to express v as a linear combination of {wi,W 2 ,W 3 }. For example, 

V = 2wi - W 2 = W] - 2W2 + W3- A 


Is the vector v = 


0 

1 

1 


in the span of the vectors 



'o' 

l_ J 

'2' 


-2" 


V 

w = 

1 

1 

in the span of 

2 

0 


-1 

2 

m — 

, and 

1 

0 

• m 


V 


'2' 


V 

0 

1 

1 

, and 

3 

1 


1 


0 


? Is 


? Check your answers below. 7 


The vectors ei and £2 are lin- 
early independent. There is only 

one way to write 1 ^ 


ei and e2: 

1 
0 


*f 4 


in terms of 


But if we give ourselves a third 
3 " 


vector, say v = 
also write 


, then we can 


+ 2 


The three vectors ei , ea and v are 
not linearly independent. 


Linear independence 

Linear independence is a way to talk about uniqueness of solutions to linear 
equations. 

Definition 2.4.6 (Linear independence). The vectors Vi, . . . , v fc are 
linearly independent if there is at most one way of writing a vector w as a 
linear combination of tfi, . . . ,•?*, i.e., if 

k k 

w = implies Xi =3/1, x 2 •• =y k . 2 . 4.6 

*-1 *=1 

(Note that the unknowns in Definition 2 . 4.6 are the coefficients #*.) In other 
words, if vi, . . . , Vfc are linearly independent, then if the system of equations 

w = XiVj +X2V2 4- ••• + £ fc Vfc 2.4.7 


has a solution, the solution is unique. 


7 Yes, $ is in the span of the others: v = 3 


'1 

2 

1 

o’ 


0 

1 

3 

1 

row reduces to 

,1 

1 

0 

1 

« 



V 


mt m 

2 


r 

0 

-2 

1 

+ 

3 

l 


1 

• ■ 


0 

■ • 


10 0 3 
0 10-2 
0 0 1 1 


others (as you might have suspected, since 


, since the matrix 


. No, w is not in the span of the 


reduce the appropriate matrix we get 
no solution. 


'21 


V 

2 

0. 

is a multiple of 

1 

0 

• « 


1 0 1/2 0 ] 
0 10 0 
0 0 0 1 


). If we row 


the system of equations has 
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Like the term one to one , the 
term linear independence is a way 
to talk about uniqueness of solu- 
tions. 


The pivotal columns of a ma- 
trix are linearly independent. 

N on-pivotal is another way of 
saying linearly dependent. 


Saying that v is in the span of 
the vectors wi , W 2 , W 3 , w 4 means 
that the system of equations has 
a solution; since the four vectors 
are not linearly independent, the 
solution is not unique. 

In the case of three linearly in- 
dependent vectors in IR 3 , there is a 
unique solution for every b € IR 3 , 
but uniqueness is irrelevant to the 
question of whether the vectors 
span IR 3 ; span is concerned only 
with existence of solutions. 


Example 2.4.7 (Linearly independent vectors). Are the vectors 



V 


—2 


'- 1 * 

Wi = 

2 

3 

» «■ 

II 

cs 

1 

2 

, w 3 = 

1 

-1 
m m 


linearly independent? Theorem 2.2.5 says that a system Ax = £ of n equations 
in n unknowns has a unique solution for every b if and only if A row reduces 
to the identity. The matrix 


"1 -2 - 1 “ 


'1 0 0 ‘ 

2 1 1 

row reduces to 

0 1 0 

3 2 - 1 _ 


0 0 1 

Wi W 3 #s 




so the vectors are linearly independent: there is only one way to write a vector 
v, in R 3 as the linear combination a t wi + 61 w 2 + CiW 3 . These three vectors 
also span IR 3 , since we know from Theorem 2.2.5 that any vector in IR 3 can be 
written as a linear combination of them. A 


Example 2.4.8 (Linearly dependent vectors). If we make the collection 
of vectors in Example 2.4.7 linearly dependent, by adding a vector that is a 
linear combination of some of them, say w 4 = 2 vfr 2 + w 3 : 




V 


-2 


^ m 

-1 


-5 

2 

3 

» m 

> W 2 = 

1 

2 

» « 

, w 3 = 

1 

-1 
m m 

, W 4 = 

3 

3 

■ • 


2.4.8 


and use them to express some arbitrary vector 8 in IR 3 , say v = 


1 -2 -1 -5 -7 

2 113-2 

3 2-131 


which row reduces to 


-7 

•2 

1 


10 0 0 
0 10 2 
0 0 11 


, we get 


-2 
3 

-1 

3.4.9 


Since the fourth column is non-pivotal and the last column has no pivotal 1 , the 
system has infinitely many solutions: there are infinitely many ways to write v 
as a linear combination of the vectors Wj,w 2 , w 3 ,w 4 . The vector v is in the 
span of those vectors, but they are not linearly independent. A 


It is clear from Theorem 2.2.5 that three linearly independent vectors in JR 3 
span IR 3 : three linearly independent vectors in IR 3 row reduce to the identity, 


’Actually, not quite arbitrary. The first choice was 


n 

1 


, but that resulted in 


messy fractions, so we looked for a vector that gave a neater answer. 
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which means that (considering the three vectors as the matrix A) there will be 
a unique solution for Ax = b for every b £ K 3 . 

But linear independence does not guarantee the existence of a solution, as 

we see below. 


Calling a single vector linearly 
independent may seem bizarre; 
the word independence seems to 
imply that there is something to 
be independent from. But one can 
easily verify that the case of one 
vector is simply Definition 2.4.6 
with k — 1; excluding that case 
from the definition would create 
all sorts of difficulties. 


Example 2.4.9. Let us modify the vectors of Example 2.4.7 to get 


1- 


-2- 


-1- 


-- 7 - 

2 

, U2 = 

1 

, u 3 = 

1 

, V = 

-2 

3 

2 

-1 

1 

.0. 


.- 1 . 


. 1 . 


. 1 . 


The vectors uj, U 2 , u 3 € R 4 are linearly independent, but v is not in their span: 
the matrix 


■1 

-2 -1 

-7- 


■1 

0 

0 

0- 

2 

1 

1 

-2 

row reduces to 

0 

1 

0 

0 

3 

2 -1 

1 

0 

0 

1 

0 

.0 

-1 

1 

1 . 


.0 

0 

0 

1 . 


2.4.11 


Linear independence is not re- 
stricted to vectors in IR n : it also 
applies to matrices (and more gen- 
erally, to elements of arbitrary vec- 
tor spaces). The matrices A, B 
and C are linearly independent if 
the only solution to 


The pivotal 1 in the last column tells us that the system of equations has no 
solution, as you would expect: it is rare that four equations in three unknowns 
will be compatible. A 

Example 2.4.10 (Geometrical interpretation of linear independence). 

(1) One vector is linearly independent if it isn’t the zero vector. 

(2) Two vectors are linearly independent if they do not both lie on the same 
line. 


ctiA -4- ct*B 4- a$C = 0 is 

Qtl = Ot2 — <*3 = 0. 


(3) Three vectors are linearly independent if they do not all lie in the same 
plane. 


These are not separate definitions; they are all examples of Definition 2.4.6. 


Alternative definitions of linear independence 

Many books define linear independence as follows, then prove the equivalence 
of our definition: 

Definition 2.4.11 (Alternative definition of linear independence). 

A set of fc vectors ?i, ...» is linearly independent if and only if the only 
solution to 

fli Vi + 02 ^ 2 + • •+<**?* * 9 is dj = 03 = • • • = a* = 0. 2.4.12 
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More generally, more than n 
vectors in K n are never linearly in- 
dependent, and fewer than n vec- 
tors never span. 

The matrix of Equation 2.4.16 
could have fewer than four pivotal 
columns, and if it has four, they 
need not be the first four. All that 
matters for the proof is to know 
that at least one of the first five 
columns must be non-pivotal. 

The matrix of Equation 2.4.18, 
f ?l,*2^3,V4 , Jj, 

A b 

corresponds to the equation 

ai 

a 2 

<*3 


Vl.l Vl,2 vi, 3 Vl,4 


Wi 

V2.1 V2.2 V2.3 V2,4 


W? 

V3,l V3,2 V 3 ,3 V3 )4 


W3 

V4.1 V4.2 V4.3 V4.4 


WA 

.V5,l V5.2 V5.3 V5, 4 _ 




In one direction, we know that Ovi +0v2 H — • -bOv* — 0, so if the coefficients 
do not all equal 0, there will be two ways of writing the zero vector as a linear 
combination, which contradicts Definition 2.4.6. In the other direction, if 

6 1 vi+...6fcV fc and dVj + . . .c^v* 2.4.13 

are two ways of writing a vector u as a linear combination of Vi, . . . then 

(bi — ci)V, + . • . ( bk — c fc)vfc = 0. 2.4.14 

But if the only solution to that equation is &i — Ci = 0, . . . , bk = c* = 0, then 
there is only one way of writing fi as a linear combination of the 

Yet another equivalent statement (as you are asked to show in Exercise 2.4.2) 
is to say that v, are linearly independent if none of the v, is a linear combination 
of the others. 

How many vectors that span IR n can be linearly independent? 

The following theorem is basic to the entire theory: 

Theorem 2.4.12. In R n , n + 1 vectors are never linearly independent , and 
n — 1 vectors never span. 

Proof. First, we will show that in R 4 , five vectors are never linearly indepen- 
dent; the general case is exactly the same, and a bit messier to write. 

If we express a vector w € R 4 using the five vectors 

Vi,V 2 ,v 3 ,v 4 ,u, 2.4.15 

at least one column is non-pivotal, since there can be at most four pivotal l’s: 

’1 0 0 0 tti w \ ’ 

0 1 0 0 u 2 w 2 941fi 

0 0 1 0 u 3 w 3 * * 

.0 0 0 1 u 4 W4. 

(The tilde indicates the row reduced entry: row reduction turns u\ into u\.) 
So there are infinitely many solutions: infinitely many ways that w can be 
expressed as a linear combination of the vectors Vi, V 2 , v 3 , v 4 , u. 

Next, we need to prove that n - 1 vectors never span R n . We will show that 
four vectors never span R 5 ; the general case is the same. 

Saying that four vectors do not span R 5 means that there exists a vector 
w 6 R 5 such that the equation 

a 1^1 + a 2 v 2 + a 3 v 3 + a 4 tf 4 = w 

has no solutions. Indeed, if we row reduce the matrix 


2.4.17 
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What gives us the right to set 
ws to 1 and set the other entries of 
w to 0? To prove that four vectors 
never span R‘‘, we need to find 
just one vector w for which the 
equations arc incompatible. Since 
any row operation can be undone, 
we can assign any values we like to 
our w, and then bring the echelon 
matrix back to where it started, 
(vi,v 2 , v 3 ,v 4 , w). The vector w 
that we get by starting with 

'0‘ 

0 

w - 0 

0 
1 

a 

and undoing the row operations 
is a vector that makes the system 
incompatible. 


we end up with at least one row with at least four zeroes: any row of A must 
either contain a pivotal 1 or be all 0's, and we have five rows but at most four 
pivotal l’s: 


’lOOOwr 

0 1 0 0 W 2 

0 0 1 0 W3 

0 0 0 1 w 4 

.0 0 0 0 w*,. 


2.4.19 


Set u? 5 = 1, and set the other entries of w to 0; then W 5 is a pivotal 1, and the 
system has no solutions; w is outside the span of our four vectors. (See the box 
in the margin if you don’t see why we were allowed to set w$ = 1.) □ 


We can look at the same thing in terms of multiplication by elementary 
matrices; here we will treat the general case of n - 1 vectors in R n . Suppose 
that the row reduction of [vj, . . . , v n -ij is achieved by multiplying on the left 
by the product of elementary matrices E — £* ... E\> so that 

£([vi,. ... v„_ij) = V 2.4.20 


is in echelon form; hence its bottom row is all zeroes. 

Thus, to show that our n~ 1 vectors do not span lR n , we want the last column 
of the augmented, row-reduced matrix to be 



1 


2.4.21 


we will then have a system of equations with no solution. We can achieve that 

by taking w = E _1 e„: the system of linear equations a\\i -f +a n _!V n _i = 

E has no solutions. 


A set of vectors as a basis 

Choosing a basis for a subspace of ! n , or for R n itself, is like choosing axes 
(with units marked) in the plane or in space. This allows us to pass from 
non-coordinate geometry (synthetic geometry) to coordinate geometry (analytic 
geometry). Bases provide a “frame of reference” for vectors in a subspace. 

Definition 2.4.13 (Basis). Let V C IR n be a subspace. An ordered set of 
vectors v 1? . . . , v* € V is called a basis of V if it satisfies one of the three 
equivalent conditions. 

(1) The set is a maximal linearly independent set : it is independent, and 
if you add one more vector, the set will no longer be linearly independent. 
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The direction of basis vectors 
gives the direction of the axes; the 
length of the basis vectors provides 
units for those axes. 

Recall (Definition 1.1.5) that 
a subspace of R ft is a subset of 
R n that is closed under addition 
and closed under multiplication by 
scalars. Requiring that the vectors 
be ordered is just a convenience. 



The standard basis vectors 



Figure 2.4.2. 


The standard basis would not 
be convenient when surveying this 
yard. Use a basis suited to the job. 


(2) The set is a minimal spanning set : it spans V, and if you drop one 
vector, it will no longer span V. 

(3) The set is a linearly independent set spanning V. 


Before proving that these conditions are indeed equivalent, let’s see some 
examples. 

Example 2.4.14 (Standard basis). The fundamental example of a basis is 
the standard basis of R n ; our vectors are already lists of numbers, written with 
respect to the “standard basis” of standard basis vectors, ej, . . . ,e„. 

Clearly every vector in R n is in the span of ei, . . . ,e„: 

Gi ' 

= aiei + • • • + a n e n \ 2.4.22 

and it is equally clear that Si,.,. ,S U are linearly independent (Exercise 2.4.3). 

Example 2.4.15 (Basis formed of n vectors in R n ). The standard basis 
is not the only one. For instance, 

j , _j form a basis in R 2 , as do ^ (but not ^ , °q 5 J •) 

A 

In general, if you choose at random n vectors in R n , they will form a basis. 
In R 2 , the odds are completely against picking two vectors on the same line; in 
R 3 the odds are completely against picking three vectors in the same plane 
You might think that the standard basis should be enough. But there are 
times when a problem becomes much more straightforward in a different basis. 
The best examples of this are beyond the scope of this chapter (eigenvectors, 
orthogonal polynomials), but a simple case is illustrated by Figure 2.4.2. (Think 
also of decimals and fractions. It is a great deal simpler to write 1/7 than 
0.142857142857 . . . , yet at other times computing with decimals is easier.) 

In addition, for a subspace V c R n it is usually inefficient to describe vectors 
using all n numbers: 


Example 2.4.16 (Using two basis vectors in a subspace of R 3 ). In the 
subspace V C R 3 of equation x + y + z = 0, rather than writing a vector by 
giving its three entries, we could write them using only two coefficients, a and 

r 1 \ 

6, and the vectors wi — I — 1 I and W 2 = I 0 . For instance, 


v = «Wi + 6 W 2 - 


2.4.23 
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What other two vectors might you choose as a basis for V ? 9 


One student asked, “When n > 
3, how can vectors be orthogonal, 
or is that some weird math thing 
you just can’t visualize?” Pre- 
cisely! Two vectors are orthogonal 
if their dot product is 0. In R 2 and 
R 3 , this corresponds to the vectors 
being perpendicular to each other. 
The geometrical relation in higher 
dimensions is analogous to that in 
R 2 and R 3 , but you shouldn’t ex- 
pect to be able to visualize 4 (or 
17, or 98) vectors all perpendicu- 
lar to each other. 

For a long time, the impossi- 
bility of visualizing higher dimen- 
sions hobbled mathematicians. In 
1827, August Moebius wrote that 
if two flat figures can be moved in 
space so that they coincide at ev- 
ery point, they are “equal and sim- 
ilar.” To speak of equal and sim- 
ilar objects in three dimensions, 
he continued, one would have to 
be able to move them in four- 
dimensional space, to make them 
coincide. “But since such space 
cannot be imagined, coincidence 
in this case is impossible.” 


Orthonormal bases 

When doing geometry, it is almost always best to work with an orthonormal 
basis. Below, recall that two vectors are orthogonal if their dot product is zero 
(Corollary 1.4.80). 

Definition 2.4.17 (Orthonormal basis). A basis ^i, V 2 • • • of a sub- 
space V C R n is orthonormal if each vector in the basis is orthogonal to 
every other vector in the basis, and if all basis vectors have length 1: 

• v j = 0 for i jL j and |v*| = 1 for all i < k. 


The standard basis is of course orthonormal. 

The reason orthonormal bases are interesting is that the length squared of 
a vector is the sum of the squares of its coordinates, with respect to any or- 
thonormal basis. If Vi, . . . v* and Wj , . . . w* are two orthonormal bases, and 

OiVi H 1- a/feV/t = 6i#! H 1- 6*w*, then a* H I- a\ = b\ H 1- b\. 

The proof is left as Exercise 2.4.4. 

If all vectors in the basis are orthogonal to each other, but they don’t all have 
length 1, then the basis is orthogonal Is either of the two bases of Example 
2.4.15 orthogonal? orthonormal? 10 


Proposition 2.4.18. An orthogonal set of nonzero vectors Vj , . . . , is lin- 
early independent. 

Proof. Suppose oivi H 1- a*v* = 6. Take the dot product of both sides 

with 


(oi Vj -| h a/cVk) • v, = 0 • tf, = 0. 


2.4.24 


9 The vectors 


0 

1 

-1 


are a basis for V, as are 


-1/2 1 


r 

-1/2 

t 

-i 

1 


0 


; the vectors 


-1 
0 
1. 

just need to be linear 

be 0 (satisfying x + y + z = 0). Part of the “structure” of the subspace V is thus built 
into the basis vectors. 

*°The first is orthogonal, since [}]•[_}] = 1 - 1 = 0; the second is not, since 
r 2] f 0.5*1 _ J 

[oj [ — 3j - 1 + 0 = I- Neither is orthonormal, tbe vectors of the first basis each 
have length y/2, and those of the second have lengths 2 and v/9-25. 
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The surprising thing about 
Proposition 2.4.18 is that it al- 
lows us to assert that a set of vec- 
tors is linearly independent look- 
ing only at pairs of vectors. It 
is of course not true that if you 
have a set of vectors and every pair 
is linearly independent, the whole 
set is linearly independent; con- 


tors 


. for instance, the three vec 

T 

u 

9 

o' 

.1. 

and 

l' 

1 

■ 


By “nontrivial,” we mean a so- 
lution other than 

rti = a 2 = • • ■ = a n = 6 = 0. 


So 


ai(vi • v,) H I- a;(v, • v*) 4- f a fc (v fc • v») = 0. 2.4.25 

Since the Vj form an orthogonal set, all the dot products on the loft are zero 
except for the t'th. so a* (v, • v*) = 0. Since the vectors are assumed to be 
nonzero, this says that a* = 0. □ 

Equivalence of the three conditions for a basis 

We need to show that the three conditions for a basis given in Definition 2.4.13 
are indeed equivalent. 

We will show that (1) implies (2): that if a set of vectors is a maximal linearly 
independent set, it is a minimal spanning set. Let KcK" be a subspace. If 
an ordered set of vectors vj, . . . , v* 6 V is a maximal linearly independent set. 
then for any other vector w € V , the set {vj, . . . , v*, w} is linearly dependent , 
and (by Definition 2.4.11) there exists a nontrivial relation 


ajVi H + afcVfc -p 6w = 0. 2.4.26 

The coefficient 6 is not zero, because if it were, the relation would then involve 
only the v‘s, which are linearly independent by hypothesis. Therefore we can 
divide through by 6, expressing w as a linear combination of the v’s: 

^V| + • • • + = — w. 2.4.27 

0 0 

Since w 6 V can be any vector in V, we see that the v’s do span V. 

Moreover, ,v* is a minimal spanning set: if one of the v,‘s is 

omitted, the set no longer spans, since the omitted v* is linearly independent 
of the others and hence cannot be in the span of the others. 

This shows that (1) implies (2); the other implications are similar and left 
as Exercise 2.4.7. 


Now we can restate Theorem 2.4.12: 


Corollary 2.4.19. Every basis ofW 1 has exactly n elements. 


Indeed a set of vectors in R n never spans M n if it has fewer than n elements, 
and it is never linearly independent if it has more than n elements (see Theorem 
2.4.12). 

The notion of the dimension of a subspace will allow us to talk about, such 
things as the size of the space of solutions to a set of equations, or the number 
of genuinely different equations. 


Proposition and Definition 2.4.20 (Dimension). Every subspace E C 
R n has a basis, and any two bases of a subspace E have the same Jiuinber of 
elements, called the dimension of E. 
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We can express any as a lin- 
ear combination of the v< because 
the span E. 

It might seem more natural to 
express the w’s as linear combi- 
nations of the V’s in the following 
way: 

wi = ai,iV| H +ai ifc v fc 


w p — Op.jVi H hOp.fc-^fc. 

The a’s then form the transpose of 
matrix A we have written. We use 
A because it is the change of basis 
matrix, which we will see again in 
Section 2.6 (Theorem 2.6.16). 

The sum in Equation 2.4.28 is 
not a matrix multiplication. For 
one thing it is the sum of products 
of numbers with vectors ; for an- 
other, the indices are in the wrong 
order. 


Proof. First we construct a basis of E. If E = {0}, the empty set is a basis of 
E. Otherwise, choose a sequence of vectors Vi, v 2 , . . . in E as follows: choose 
vj ^ 6, then v 2 £ Sp(Vi), then v 3 £ Sp(vi,v 2 ), etc. Vectors chosen this 
way are clearly linearly independent. Therefore we can choose at most n such 
vectors, and for some m < n, Vi, . . . , v* will span E. (If they don’t span, we 
can choose another.) Since these vectors are linearly independent, they form a 
basis of E. 

Now to see that any two bases have the same number of elements. Suppose 
vi,...,v* and Wi, . . . , w p are two bases of E. Then there exists an kxp matrix 
A with entries a* i-7 such that 

k 

w j = y^ajjVj. 2.4.28 

i= 1 

i.e., that Wj can be expressed as a linear combination of the v*. There also 
exists a p x k matrix B with entries bi ti such that 

p 

Vi = 2.4.29 

/=i 

Substituting the value for v< of Equation 2.4.29 into Equation 2.4.28 gives 

* p p / k \ 

*> = X X) = X (X 6 '.‘ a i.y ) 2.4.30 

*=1 1 = 1 /=1 \ i=l ) 

l,jth entry of BA 

This expresses as a linear combination of the w’s, but since the w’s are 
linearly independent, Equation 2.4.30 must read 

wj = Owj + 0w 2 -f h lwj -f h 0w p . 2.4.31 

(Si KiOij) is 0, unless l in which case it is 1. In other words, BA = /. 
The same argument, exchanging the roles of the v’s and the w’s, shows that 
AB - / . Thus A is invertible, hence square, and k-p. □ 


Corollary 2*4.21. The only n-dimensional subepace ofW 1 is R n itself. 


Remark. We said earlier that the terms linear combinations, span, and linear 
independence give a precise way to answer the questions, given a collection of 
linear equations, how many genuinely different equations do we have. We have 
seen that row reduction provides a systematic way to determine how many of 
the columns of a matrix are linearly independent. But the equations correspond 
to the rows of a matrix, not to its columns. In the next section we will see that 
the number of linearly independent equations in the system A5L = b is the same 
as the number of linearly independent columns in A. 
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2.5 Kernels and Images 


The kernel and the image of a linear transformation are important but rather 
abstract concepts. They are best understood in terms of linear equations. Ker- 
nels are related to uniqueness of solutions of linear equations, whereas images 
are related to their existence. 


The kernel is sometimes called 
the “null-space.” In Definition 
2.51, the linear transformation T 
is represented by a matrix [T]. 


The vector 


nel of 


1 

2 



-2 

-1 

3 


is in the ker- 


1 

1 

J 


, because 


a 


1 

S3 



1 1 1 


-1 

3. 

1 

O' 

2 -1 1 

J 


II 

.0. 


Definition 2.5.1 (Kernel). The kernel of a linear transformation T, de- 
noted kerT, is the set of vectors x such that T(x) — 0. When T is repre- 
sented by a matrix [7 1 ], the kernel is the set of solutions to the system of 
linear equations [T]x = 0. 


Kernels are a way to talk about uniqueness of solutions of linear equations. 

Proposition 2.5.2. The system of linear equations T(x) = b has at most 
one solution for every b if and only if kerT = {6} (that is, if the only vector 
in the kernel is the zero vector). 

Proof. If the kernel of T is not 0, then there is more than one solution to 
T(x) = 0 (the other one being of course x = 0). 

In the other direction, if there exists a b for which T(x) = b has more than 
one solution, i.e., 


It is the same to say that b is 
in the image of A and to say that 
b is in the spam of the columns of g 0 ^ _ j? 2 
A. We can rewrite AX = b as 


T(xi) = T(x 2 ) = b and xi ^ x 2 , then 
T(x 1 - x 2 ) = T(xi) - T(x 2 ) = b - b = 0. 

is a nonzero element of the kernel. □ 


2.5.1 


aixi 4- a 2 x 2 4 4- a„x n = b. 

If Ax = b, then b is in the spaun of 
ai,...a n> since it cam be written 
as a linear combination of those 
vectors. 


The image of a transformation is a way to talk about existence of solutions. 


Definition 2.5.3 (Image). The image of T, denoted Img T, is the set of 
vectors b for which there exists a solution of T(£) = b. 


For example, 



is in the image of 


1 

2 


3 

0 


, since 


[1 31 


1 


rn 

to 

O 

1 


2 

J 




2.5.2 


Remark. The word image is not restricted to linear algebra; for example, the 
image of f(x) = x 2 is the set of positive reals. A 
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The image is sometimes called 
the “range" ; this usage is a source 
of confusion since many authors, 
including ourselves, use “range" 
to mean the entire set of arrival: 
the range of a transformation T : 
K n — M m is R m . 

The image of T is sometimes 
denoted imT, but we will stick to 
Img T to avoid confusion with “the 
imaginary part." which is also de- 
noted im. For any complex ma- 
trix, both “image” and “imaginary 
part” make sense. 


The following statements are 
equivalent: 

(1) the kernel of A is 0; 

(2) the only solution to the 
equation Ax = 0 is x = 0; 

(3) the columns a, making up 
A are linearly independent; 

(4) the transformation given by 
A is one to one; 

(5) the transformation given by 
A is injective; 

(6) if the equation Ax = b has 
a solution, it is unique. 


Given the matrix and vectors below, which if any vectors are in the kernel 
of A ? Check your answer below. 11 








- 0* 


’1- 


■ o- 


'1 

0 

1 

r 


2 


0 

, v 3 = 

4 

A = 

2 

1 

1 

3 

, vi = 

1 

, v 2 = 

1 

2 


1 

0 

2 

2 


A 

.- 1 . 


.0. 


.-2. 


The image and kernel of a linear transformation provide a third language 
for talking about existence and uniqueness of solutions to linear equations, as 
summarized in Figure 2.5.1. It is important to master all three and understand 
their equivalence. We may think of the first language as computational*, does a 
system of linear equations have a solution? Is it unique? The second language, 
that of span and linear independence of vectors, is more algebraic. 

The third language, that of image and kernel, is more geometric, concerning 
subspaces. The kernel is a subspace of the domain (set of departure) of a linear 
transformation; the image is a subspace of its range (set of arrival). 


1 

Algorithms 

Algebra 

Geometry 

Row reduction 

Existence of solutions 
Uniqueness of solutions 

Inverses of matrices 
Solving linear equations 

Span 

Linear independence 

Subspaces 

Images 

Kernels 


FIGURE 2.5.1. Three languages for discussing solutions to linear equations: algo- 
rithms, algebra, geometry 


This may be clearer if we write our definitions more precisely, specifying 
the domain and range of our transformation. Let T : W 1 —* lR m be a linear 
transformation given by the m x n matrix [T]. Then: 

(1) The kernel of T is the set of all vectors v € K n such that [T]v = 
0. (Note that the vectors in the kernel are in R n . the domain of the 
transformation.) 

(2) The image of T is the set of vectors w € M m such that there is a vector 
^ € K n with [T]v = w. (Note that the vectors in the image are in R m , 
the range of T.) 


v 2 is not, since Av 2 = 

'2* 

3 

. The vector 

2 

3 


3 


3 


is in the image of A. 
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Proposition 2.5.4 means that 
the kernel and the image are closed 
under addition and under multi- 
plication by scalars; if you add two 
elements of the kernel you get an 
element of the kernel, and so on. 


Thus by definition, the kernel of a transformation is a subset of its domain, 
and the image is a subset of its range. In fact, they are also subspaces of the 
domain and range respectively. 

Proposition 2.5.4. Jf T r R n -► R m is a linear transformation given by the 
m x n matrix A, then the kernel of A is a subspace of R n , and the image of 
A is a subspace of R m . 


The proof is left as Exercise 2.5.1. 

Given the vectors and the matrix T below: 


T = 

'2-13 2 f 
1 0 13 0 

, Wi = 

1 ' 

2 

II 

cs 

"0* 

1 

1 

.2. 

* 

CO 

II 

V 

0 

, W 4 = 

-2‘ 

1 

2 


2-1101 


3 



1 


0 

.0. 


which vectors have the right height to be in the kernel of 7? To be in its image? 
Can you find an element of its kernel? Check your answer below. 12 


Finding bases for the image and kernel 

Suppose A is the matrix of T. If we row reduce A to echelon form A, we can 
find a basis for the image, using the following theorem. Recall (Definition 2.2.1) 
that a pivotal column of A is one whose corresponding column in A contains a 
pivotal 1. 

Theorem 2.5.5 (A basis for the image). The pivotal columns of A form 
a basis for Img A. 

We will prove this theorem, and the analogous theorem for the kernel, after 
giving some examples. 

Example 2.5.6 (Finding a basis for the image). Consider the matrix A 
below, which describes a linear transformation from R 5 to R 4 : 

12 The matrix T represents a transformation from R 5 to R 3 ; it takes a vector in 
R 5 and gives a vector in R 3 . Therefore #4 has the right height to be in the kernel 
(although it isn’t), and Wi and #3 have the right height to be in its image. Since the 

0 ' 

1 

0 . 

0 
1 


sum of the second and fifth columns of T is 0, one element of the kernel is 
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The vectors of Equation 2.5.3 
are not the only basis for the im- 
age. 

Note that while the pivotal 
columns of the original matrix A 
form a basis for the image, it is 
not necessarily the case that the 
columns of the row reduced matrix 
A containing pivotal l’s form such 
a basis. For example, the matrix 

[\ l] "» reduces [o 0. • 

The vector ^ j J forms a basis for 
the image, but [*1 does not. 


A = 


10 13 0 

0 112 0 
112 5 1 
0 0 0 0 0J 


which row reduces to A = 



■1 

0 

1 

3 

0- 



0 

1 

1 

2 

0 


“ 

0 

0 

0 

0 

I 



.0 

0 

0 

0 

0. 


columns 

1, 

2 and 5, so 


columns 1, 2 and 5 of the original matrix A are a basis for the image: 


-1- 


-0- 


-0- 

0 

1 

J 

1 

1 

, and 

0 

1 

.0. 


.0. 


.0. 


2.5.3 


For example, the w below, which is in the image of A , can be expressed 
uniquely as a linear combination of the image basis vectors: 


w = 2 ai + £2 - S3 + 2S4 - 3 as = 


■7” 

4 

8 

= 7 

•1“ 

0 

1 

+ 4 

0- 

1 

1 

-3 

'0- 

0 

1 

.0. 


.0. 


.0. 


.0. 


2.5.4 


Note that each vector in the basis for the image has four entries, as it must, 
since the image is a subspace of K 4 . (The image is not of course IR 4 itself; a 
basis for M 4 must have four elements.) A 


A basis for the kernel 

Finding a basis for the kernel is more complicated; you may find it helpful to 
refer to Example 2.5.8 to understand the statement of Theorem 2.5.7. 

A basis for the kernel is of course a set of vectors such that any vector 
in the kernel (any vector w satisfying Aw = 0) can be expressed as a linear 
combination of those basis vectors. The basis vectors must themselves be in 
the kernel, and they must be linearly independent. 

Theorem 2.2.4 says that if a system of linear equations has a solution, then 
it has a unique solution for any value you choose of the non-pivotal unknowns. 
Clearly Aw = 0 has a solution, namely w = 0. So the tactic is to choose 
the values of the non-pivotal unknowns in a convenient way. We take our 
inspiration from the standard basis vectors, which each have one entry equal to 
1, and the others 0. We construct one vector for each non-pivotal column, by 
setting the entry corresponding to that non-pivotal unknown to be 1, and the 
entries corresponding to the other non-pivotal unknowns to be 0. (The entries 
corresponding to the pivotal unknowns will be whatever they have to be to 
satisfy the equation Av* =0.) 
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In Example 2.5.6, the matrix A 
has two non-pivotal columns, so 
p = 2; those two columns are the 
third and fourth columns of A, so 
k i=3 and = 4. 


Theorem 2.5.7 (A basis for the kernel). Let p be the number of n on- 
pivotal columns of A, and ki , . . . , k p be their positions. For each non-pivotal 
column form the vector satisfying Av» = 0 , and such that its k{th entry 
is 1, and its kjth entries are aU 0 , for j ^ i. The vectors ^i, . . . , V p form a 
basis of ker A. 


An equation Ax = 6 (i.e., Ax = 
b where b = 0) is called homoge- 
neous. 


These two vectors are clearly 
linearly independent; no “linear 
combination” of Vi could produce 
the 1 in the fourth entry of V 2 , 
and no “linear combination” of V 2 
could produce the 1 in the third 
entry of Vi. Basis vectors found 
using the technique given in The- 
orem 2.5.7 will always be linearly 
independent, since for each entry 
corresponding to a non-pivotal un- 
known, one basis vector will have 
1 and all the others will have 0. 


We prove Theorem 2.5.7 below. First, an example. 

Example 2.5.8 (Finding a basis for the kernel). The third and fourth 
columns of A in Example 2.5.6 above are non-pivotal, so the system has a 
unique solution for any values we choose of the third and fourth unknowns. In 
particular, there is a unique vector Vi whose third entry is 1 and fourth entry 
is 0, such that Avi = 0. There is another, V2, whose fourth entry is 1 and third 
entry is 0, such that Av?2 = 0: 



Now we need to fill in the blanks, finding the first, second, and fifth entries of 
these vectors, which correspond to the pivotal unknowns. We read these values 

<V «S(> — w 

from the first three rows of [A, 0] (remembering that a solution for Ax = 0 is 
also a solution for Ax = 0): 

"1 0 1 3 0 Ol Xi+£3 + 3x4=0 

r T ni — 0 1 1 2 0 0 . 0 n ocp 

[A, 0] 0 0 0 0 1 0 ? i*c*j ^2 d" ^3 2x4 — 0 2.5.6 

.0 0 0 0 0 oj x 5 =0, 


which gives 

Note that each vector in the 
basis for the kernel has five entries, 
as it must, since the domain of the 
transformation is R 5 . 

So for Vi, where X3 = 1 and £4 = 0, the first entry is x\ = -1, the second is 
-1 and the fifth is 0; the corresponding entries for ^2 we -“3, —2 and 0: 

*-n r-3’ 

-1 -2 

vi = 1 ; v 2 = 0 . 2.5.8 

0 1 

- OJ 0. 


£1 — -£3 - 3X4 

x 2 = -x 3 - 2 x 4 2.5.7 

x 5 = 0. 
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These two vectors form a basis of the kernel of A. For example, the vector 

• o- 
-1 

v = 3 is in the kernel of A, since Av = 0, so it should be possible to 

-1 

- °J 

express v as a linear combination of the vectors of the basis for the kernel. 
Indeed it is: v = 3vi — \ 2 - A 

Now find a basis for the image and kernel of the following matrix: 

'2 13 1] [i 0 1 O' 

1 -10 1, which row reduces to 0 110, 2.5.9 

1 1 2 1 J [O 0 0 1 

checking your answer below . 13 

Proof of Theorem 2.5.5 (A basis for the image). Let A — [ai ...am]. 
To prove that the pivotal columns of A form a basis for the image of A we need 
to prove: (1) that the pivotal columns of A are in the image, (2) that they are 
linearly independent and (3) that they span the image. 

( 1 ) The pivotal columns of A (in fact, all columns of A) are in the image, 
since Ae t — a*. 

( 2 ) The vectors are linearly independent, since when all non-pivotal entries 
of x are 0, the only solution of Ax = 0 is x = 0. (If the pivotal 
unknowns are also 0, i.e., if x = 0, then clearly A3. = 0. This is the only 
such solution, because the system has a unique solution for each value 
we choose of the non-pivotal unknowns.) 

(3) They span the image, since each non-pivotal vector v* is a linear com- 
bination of the preceding pivotal ones (Equation 2.2.8). □ 

Proof of Theorem 2*5.7 (A basis for the kernel). Similarly, to prove 
that the vectors v, = Vi , . . . , v p form a basis for the kernel of A, we must show 


13 The vectors 



form a basis for the image; the vector 


is a basis for the kernel. The row-reduced matrix [A,0] is 



10 10 0] xi +x 3 =0 

0 110 0 , i.e., xa + X 3 = 0 

[o 0 0 1 0J x _ q 


The third column of the original matrix is non-pivotal, so for the vector of the basis 
of the kernel we set X 3 = 1 , which gives xi = — l,x 2 = — 1 . 
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For a transformation T : R n — ► 
R m the following statements are 
equivalent: 

(1) the columns of [T] span R m ; 

(2) the image of T is M m ; 

(3) the transformation T is onto; 

(4) the transformation T is surjec- 
tive; 

(5) the rank of T is m; 

(6) the dimension of Img(T) is m. 

(7) the row reduced matrix T has 
no row containing all zeroes. 

(8) the row reduced matrix T has 
a pivotal 1 in every row. 

For a transformation from R n 
to R”, if ker(T) = 0, then the 
image is all of R n . 

Recall (Definition 2.4.20) that 
the dimension of a subspace of R n 
is the number of basis vectors of 
the subspace. It is denoted dim. 

The dimension formula says 
there is a conservation law con- 
cerning the kernel and the image: 
saying something about unique- 
ness says something about exis- 
tence. 


The rank of a matrix is the 
most important number to asso- 
ciate to it. 


that they are in the kernel, that they are linearly independent, and that they 
span the kernel. 

(1) By definition, Av, = 0, so v, € ker A. 

(2) As pointed out in Example 2.5.8, the v t are linearly independent, since 
exactly one has a nonzero number in each position corresponding to 
non- pivotal unknown. 

(3) Saying that the v* span the kernel means that any x such that Ax = 0 
can be written as a linear combination of the v*. Indeed, suppose that 

Ax = 0. We can construct a vector w = v\ -f Xk 2 V 2 H h Xk p v p 

that has the same entry Xk t in the non-pivotal column as does x. 
Since Av * = 0, we have Aw = 0. But for each value of the non- 
pivotal variables, there is a unique vector x such that Ax = 0. Therefore 
x = w. □ 

Uniqueness and existence: the dimension formula 

Much of the power of linear algebra comes from the following theorem, known 
as the dimension formula. 

Theorem 2.5.9 (Dimension formula). Let T : R n — ► R m be a linear 
transformation. Then 

dim (ker T) -l- dim (Img T) = n, the dimension of the domain. 2.5.10 


Definition 2.5.10 (Rank and Nullity). The dimension of the image of 
a linear transformation is called its rank, and the dimension of its kernel is 
called its nullity. 

Thus the dimension formula says that for any linear transformation, the rank 
plus the nullity is equal to the dimension of the domain. 

Proof. Suppose T is given by the matrix A. Then, by Theorems 2.5.5 and 
2.5.7 above, the image has one basis vector for each pivotal column of A, and 
the kernel has one basis vector for each non-pivotal column, so in all we find 

dim (ker T) + dim (Img T) = number of columns of A = n. □ 

Given a transformation T represented by a 3 x 4 matrix [Tj with rank 2, what 
is the domain and its range of the transformation? What is the dimension of 
its kernel? Is it onto? Check your answers below. 14 

14 The domain of T is R 4 and its range is R 3 . The dimension of its kernel is 2, since 
the dimension of the kernel and that of the image equal the dimension of the domain. 
The transformation is not onto, since a basis for R 3 must have three basis elements. 
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The power of linear algebra 
comes from Corollary 2.5.11. See 
Example 2.5.14, and Exercises 
2.5.10, 2.5.16 and 2.5.17. These 
exercises deduce major mathemat- 
ical results from this corollary. 


Since Corollary 2.5.11 is an “if 
and only if” statement, it can also 
be used to deduce uniqueness from 
existence; in practice this is not 
quite so useful. 


The most important case of the dimension formula is when the domain and 
range have the same dimension. In this case, one can deduce existence of 
solutions from uniqueness, and vice versa. Most often, the first approach is 
most useful; it is often easier to prove that T(x) — 0 has a unique solution 
than it is to construct a solution of T(x) — b. It is quite remarkable that 
knowing that T(x) = 0 has a unique solution guarantees existence of solutions 
for all T(x) — b. This is, of course, an elaboration of Theorem 2.2.4. But 
that theorem depends on knowing a matrix. Corollary 2.5.11 can be applied 
when there is no matrix to write down, as we will see in Example 2.5.14, and 
in exercises mentioned at left. 


Corollary 2.5.11 (Deducing existence from uniqueness). If T : M n -+ 
R n is a linear transformation, then the equation T{9) = b has a solution for 
any b € R n if and only if the only solution to the equation T(X) = 0 is ft — 0, 
(i.e., if the kernel is zero). 

Proof. Saying that T(x) = b has a solution for any b € R n means that R n is 
the image of T, so dim ImgT = n, which is equivalent to dim ker(T) = 0. □ 

The following result is really quite surprising: it says that the number of 
linearly independent columns and the number of linearly independent rows of 
a matrix are equal. 

Proposition 2.5.12. Let A be a matrix. Then the span of the columns of 
A and the span of the tows of A have the same dimension. 


One way to understand this result is to think of constraints on the kernel of 
A. Think of A as the m x n matrix made up of its rows: 


A = 


— -4i — n 

— a 2 

A m 


2.5.11 


Then the kernel of A is made up of the vectors x satisfying the linear constraints 
A\x = 0, . . . , A m x = 0. Think of adding in these constraints one at a time. 
Before any constraints are present, the kernel is all of R n . Each time you add 
one constraint, you cut down the dimension of the kernel by 1. But this is only 
true if the new constraint is genuinely new, not a consequence of the previous 
ones, i.e., if A { is linearly independent from Ai , . . . , A t -i . 

Let us call the number of linearly independent rows Ai the row rank of A. 
The argument above leads to the formula 


dim ker A = n - row rank(i4). 
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We defined linear combinations 
in terms of linear combinations of 
vectors, but (as we will see in Sec- 
tion 2.6) the same definition can 
apply to linear combinations of 
other objects, such as matrices, 
functions. In this proof we arc ap- 
plying it to row matrices. 


The rise of the computer, with 
emphasis on computationally ef- 
fective schemes, has refocused at- 
tention on row reduction as a way 
to solve linear equations. 

Gauss is a notable exception; 
when he needed to solve linear 
equations, he used row reduction. 
In fact, row reduction is also called 
Gaussian elimination. 


The dimension formula says exactly that 

dimkerA = n rank(A), 

so the rank of A and the row rank of A should be equal. 

The argument above isn’t quite rigorous: it used the intuitively plausible but 
unjustified “Each time you add one constraint, you cut down the dimension of 
the kernel by 1." This is true and not hard to prove, but the following argument 
Is shorter (and interesting too). 

Proof. Given a matrix, we will call the span of the columns the column space 
of A and the span of the rows the row space of A. Indeed, the rows of A 
are linear combinations of the rows of A, and vice versa since row operations 
are reversible. In particular, the row space of A and of A coincide, where A 
row- reduces to A. 

The rows of A that contain pivotal l’s are a basis of the row space of A: the 
other rows are zero so they definitely don’t contribute to the row space, and 
the pivotal rows of A are linearly independent, since all the other entries in a 
column containing a pivotal 1 axe 0. So the dimension of the row space of A 
is the number of pivotal l’s of A , which we have seen is the dimension of the 
column space of A. □ 

Corollary 2.5.13. A matrix A and its transpose A T have the same rank. 

Remark. Proposition 2.5.12 gives us the statement we wanted in Section 2.4: 
the number of linearly independent equations in a system of linear equations 
Ax = b is the number of pivotal columns of A. Basing linear algebra on row 
reduction can be seen as a return to Euler’s way of thinking. It is, as Euler said, 
immediately apparent why you can’t determine x and y from the two equations 
3i - 2y — 5 and Ay = 6a: - 10. (In the original, “La raison de cet accident 
saute d’abord aux yeux”: the reason for this accident leaps to the eyes). In 
that case, it is obvious that the second equation is twice the first. When the 
linear dependence of a system of linear equations no longer leaps to the eyes y 
row reduction provides a way to make it obvious. 

Unfortunately for the history of mathematics, in the same year 1750 that Euler 
wrote his analysis, Gabriel Cramer published a treatment of linear equations 
based on determinants, which rapidly took hold, and the more qualitative ap- 
proach begun by Euler was forgotten. As Jean-Luc Dorier writes in his essay 
on the history of linear algebra, 15 

. . . even if determinants proved themselves a valuable tool for study- 
ing linear equations , it must be admitted that they introduced a certain 
complexity , linked to the technical skill their use requires. This fact had 
the undeniable effect of masking certain intuitive aspects of the nature of 
linear equations ... A 

15 J.-L. Dorier, ed., L’Enseignement de I’algtbre liniaire en question , La Pensee 
Sauvage, Editions, 1997. 
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Note that by the fundamen- 
tal theorem of algebra (Theorem 
1.6.10), every polynomial can be 
written as a product of powers 
of degree 1 polynomials (Equa- 
tion 2.5.12). Of course finding the 
a, means finding the roots of the 
polynomial, which may be very 
difficult. 


Now let us see an example of the power of Corollary 2.5.11. 

Example 2.5.14 (Partial fractions). Let 

p(x) = (x-a,)"‘---(x-a*) n * 2.5.12 

be a polynomial of degree n = n\ 4- • • • 4- n*, with the a* distinct; for example, 
x 2 — 1 = (x 4- l)(x — 1), withai = -l,a 2 = l;wi = ri 2 = 1, so thatn = 2 

x 3 — 2x 2 + x — x(x - l) 2 , withai = 0, a2 = l;i»i = l,ri2 = 2, so thatn = 3. 
The claim of partial fractions is the following: 


Proposition 2.5.15 (Partial fractions). For any such polynomial p of 
degree n, and any polynomial q of degree < n, the rational /unction q/p can 
be written uniquely as a sum of simpler terms, called partial /rations: 

q(x) qi (g) . . 9k(x) 

/ \ / \ I 1 / \ M J • , v‘lv 

p (x) (x - di ) ni (x - a*) nfc 

with each qi a polynomial of degree < n t . 


For example, when q(x) — 2x 4- 3 and p(x) = x 2 - 1, Proposition 2.5.15 
says that there exist polynomials q\ and <72 of degree less than 1 (i.e., numbers, 
which we will call Ao and Bo, the subscript indicating that they are coefficients 
of the term of degree 0) such that 


2x 4* 3 Ao Bo 
x 2 - 1 x + 1 + x — r 


2.5.14 


If q(x) = x 3 - 1 and p(x) = (x 4- l) 2 (x - l) 2 , then the proposition says that 
there exist two polynomials of degree 1, qi = A\x 4- Ao and q2 = B\X 4- Bo, 
such that 

x 3 1 A\X -4* Ao Bjx + Bo n 

(x + l) 2 (x-l ) 2 (X+1) 2 (X-1)2‘ 

In simple cases, it's clear how to find these terms. In the first case above, to 
find the numerators Ao and Bo, we multiply out to get a common denominator: 


2a: 4- 3 _ Ap Bo __ Ao(x — 1) 4- Bo(x 4- 1) __ {Ao + Bp)x 4* (Bo ~~ Ao) 
x 2 — 1 x-flx-1 X 2 — 1 X 2 — 1 


so that we get two linear equations in two unknowns: 


— Ao 4* Bo = 3 5 1 

i.e., the constants B 0 = Ao = 2.5.16 

Aq 4 - Bq = 2, 2 2 


We can think of the system of linear equations on the left-hand side of Equation 
2.5.16 as the matrix multiplication 


-1 f 

Ao 


3 

1 1 

Bo. 


2 


2.5.17 
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What is the analogous matrix multiplication for Equation 2.5. 15? 16 

What about the general case? If we put the right-hand side of Equation 
2.5.13 on a common denominator we see that q(x)/p(x) is equal to 


q\(x)(x -d2) n2 . • .(x -Qfc) nfc -fq2(^)(x — Qi) ni ( J - az) n3 - • • (x — Qfc) n * -1 H?fc(x)(s -Qi) ni . • • (x -Qfc-i ) n *~ 1 


(x -ai) ni (x -a 2 ) n2 . . .(x - ajt) n<c 


2.5.18 


Corollary 2.5.11: if T : R" — 
R n is a linear transformation, the 
equation T(x) = b has a solution 
for any b € R n if and only if 
the only solution to the equation 
T(Z) = 0 is x = 0. 

We are thinking of the transfor- 
mation T both as the matrix that 
takes the coefficients of the and 
returns the coefficients of q, and 
as the linear function that takes 
Qi, • ■ ■ ,qk and returns the polyno- 
mial q. 


As we did in our simpler cases, we could write this as a system of linear 
equations for the coefficients of the q* and solve by row reduction. But except 
in the simplest cases, computing the matrix would be a big job. Worse, how 
do we know that the system of equations we get has solutions? We might 
worry about investing a lot of work only to discover that the equations were 
incompatible. 

Proposition 2.5.15 assures us that there will always be a solution, and Corol- 
lary 2.5.11 provides the key. 

Proof of Proposition 2.5.15 (Partial fractions). Note that the matrix we 
would get following the above procedure would necessarily be an n x n matrix. 
This matrix gives a linear transformation that has as its input a vector whose 
entries are the coefficients of 91,... g*. There are n such coefficients in all. 
(Each polynomial qi has n* coefficients, for terms of degree 0 through (n» — 1), 
and the sum of the n* equals n.) It has as its output a vector giving the n 
coefficients of q (since q is of degree < n, it has n coefficients, 0 ... n - 1.) 

Thus the matrix can be thought of a linear transformation X : R n — ► M n , and 
by Corollary 2.5.11, Proposition 2.5.15 is true if and only if the only solution 
of . - -.qk) = 0 is qi ='■— q k — 0. This will follow from Lemma 2.5.16: 


16 Multiplying out, we get 

s 3 (Ai + Bi) 4- x 2 (-2Ai +Aq + 2Bi + B 0 ) + x(Ai - 2 A 0 + Bi + 2B 0 ) + A 0 + B 0 

(x + l) 2 (x - 1)2 

80 


Aq + Bo = — 1 
A\ — 2Ao + Hi + 2Ho = 0 
-2A\ + A 0 + 2Hi + Ho = 0 
A\ + Hi = 1 


(coefficient of term of degree 0) 
(coefficient of term of degree 1) 
(coefficient of term of degree 2) 
(coefficient of term of degree 3); 


i.e., 


0 10 1 
1-212 
-2 12 1 

1 0 10 


m A\ 


■-r 

Ao 


0 

Hi 


0 

LHo. 


1 



The numerator in Equation 2.5.21 
is of degree < n» , while the denom- 
inator is of degree n». 

This example really put linear 
algebra to work. Even after trans- 
lating the problem into linear alge- 
bra, via the linear transformation 
T, the answer was not clear; only 
after using the dimension formula 
is the result apparent. The dimen- 
sion formula (or rather, Corollary 
2.5.11, the dimension formula ap- 
plied to transformations from R n 
to R n ) tells us that if T : R n —♦ 
IR n is one to one (solutions are 
unique), then it is onto (solutions 
exist). 

Still, all of this is nothing more 
than the intuitively obvious state- 
ment that either n equations in 
n unknowns are independent, the 
good case, or everything goes 
wrong at once-the transformation 
is not one to one and therefore not 
onto. 
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Lemma 2.5.16. If q t ^0 is a polynomial of degree < n t> then 


lim 

I— 


Qi(x) 


(x - di) nt 


= 00 . 


2.5.19 


That is, if q x ± 0, then qi(x)/(x - a { ) n * blows up to infinity. 

Proof. It is clear that for values of x very close to a*, the denominator {x— Oi) n * 
will get very small; if all goes well the entire term will then get very big. But 
we have to work a bit to make sure both that the numerator does not get small 
equally fast, and that the other terms of the polynomial don’t compensate. 

Let us make the change of variables u — x - di, so that 


q { (x) = qi(u + a,-), which we will denote q t (u). 


Then we have 


lim 
u— o 


u n * 


= 00 


if <7i ± 0. 


2.5.20 

2.5.21 


Indeed, if qi ^ 0, then ^ 0, and there exists a number m <rii such that 


<? t (u) = a m u Tn 


+ 1 " a ni -i u 


n,-l 


2.5.22 


with d m 7^ 0. (This a m is the first nonzero coefficient; as u -+ 0, the term d m u m 
is bigger than all the other terms.) Dividing by u ni we can write 


qiW) 

u n * 


u ” 1 


_ m (®m + •••)> 


2.5.23 


where the dots . . . represent terms containing u to a positive power, since 
m < n x . In particular, 


as u -* 0, 


1 

yTli—m 


oo and (a m + ...)—► a m . 


2.5.24 


We see that as x — * d i} the term qi(x)/(x — at) n * blows up to infinity: the 
denominator gets smaller and smaller while the numerator tends to a m ^ 0. 
This ends the proof of Lemma 2.5.16. 

Proof of Proposition 2.5.15, continued. Suppose qi ^ 0. For all the other 
terms qj,j ^ i } the rational functions 


(x - dj) n J 

have the finite limits qj(di)/(di - aj) n > as x -* 


2.5.25 

d { , and therefore the sum 


9(*) _ <h(x) q k (x) 

p(x) (x-a,) ni " ' + (*-«*)»*’ 

has infinite limit as x -* di and q cannot vanish identically. So T(qi , . . . , q k ) ± 0 
if some q t ^ 0, and we can conclude— without having to compute any matrices 
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or solve any systems of equations — that Proposition 2.5.15 is correct: for any 
polynomial p of degree n, and any polynomial q of degree < n, the rational 
function q/p can be written uniquely as a sum of partial fractions. □ 

2.6 An Introduction to Abstract Vector Spaces 


We have already used vector 
spaces that are not M n : in Section 
1.4 we considered an m x n matrix 
as a point in IR nm , and in Example 
1.7.15 we spoke of the “space” P* 
of polynomials of degree at most 
k. In each case we “identified” the 
space with IR N for some appropri- 
ate N: we identified the space of 
m x n matrices with M nm , and we 
identified Pk with But just 

what “identifying” means is not 
quite clear, and difficulties with 
such identifications become more 
and more cumbersome. 


You may think of these eight 
rules as the “essence of ab- 
stracting from the vector space 
R n all its most important proper- 
ties, except its distinguished stan- 
dard basis. This allows us to work 
with other vector spaces, whose el- 
ements are not naturally defined 
in terms of lists of numbers. 


In this section we give a very brief introduction to abstract vector spaces , 
introducing vocabulary that will be useful later in the book, particularly in 
Chapter 6 on forms. 

As we will see in a moment, a vector space is a set in which elements can 
be added and multiplied by numbers. We need to decide what numbers we 
are using, and for our purposes there are only two interesting choices: real 
or complex numbers. Mainly to keep the psychological load lighter, we will 
restrict our discussion to read numbers, and consider only real vector spaces , 
to be called simply “vector spaces” from now on. (Virtually everything to be 
discussed would work just as well for complex numbers.) 

We will denote a vector in an abstract vector space by an underlined bold 
letter, to distinguish it from a vector in IR n : v € V as opposed to v € R n . 

A vector space is anything that satisfies the following rules. 

Definition 2.6.1 (Vector space). A vector space is a set V of vectors 
such that two vectors can be added to form another vector, and a vector can 
be multiplied by a scalar in IR to form another vector. This addition and 
multiplication must satisfy the following eight rules: 

(1) Additive identity. There exists a vector 0 e V such that for any 
v€ V, 0 + v = v. 

(2) Additive inverse. For any v € V, there exists a vector — v € V such 
that v + (-v) as 0. 

(3) Commutative law for addition. For all y, w e V y we have v + w = 
w + v. 

(4) Associative law for addition. For all y 1 ,y 2t y 3 € V, we have v 1 + 
(v 2 +y 3 ) = (y 1 +y 3 ) + y 3 . 

(5) Multiplicative identity. For all y € V we have lv = y. 

(6) Associative law for multiplication. For all a, /? € IR and all v € V, 
we have at(flv) = (a/3)v. 

(7) Distributive law for scalar addition. For all scalars a, & € IR and all 
v € V, we have (a + (3)\ = ay + 

(8) Distributive law for vector addition. Fbr all scalars a € R and 
Y> w G V, we have a(y + w) = ay + as. 
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Note that in Example 2.6.2 our 
assumption that addition is well 
defined in C[0, Ij uses the fact that 
the sum of two continuous func- 
tions is continuous. Similarly, 
multiplication by scalars is well 
defined, because the product of a 
continuous function by a constant 
is continuous. 


In some sense, this space “is” 
R 2 , by identifying f Qib with * € 

R ; this was not obvious from the 
definition. 


The primordial example of a vector space is of course R n itself. More gen- 
erally, a subset of R n (endowed with the same addition and multiplication by 
scalars as R n itself) is a vector space in its own right if and only if it is a 
subspace of R n (Definition 1.1.5). 

Other examples that are fairly easy to understand are the space Mat (n, m) 
of n x m matrices, with addition and multiplication defined in Section 1.2, and 
the space P* of polynomials of degree at most k. In fact, these are easy to 
‘‘identify with R n .” 

But other vector spaces have a different flavor: they are somehow much too 
big. 

Example 2.6.2 (An infinite-dimensional vector space). Consider the 
space C(0, 1) of continuous real- valued functions f(x ) defined for 0 < x < 1. 
The “vectors” of this space are functions / : (0, 1) — ► R, with addition defined as 
usual by (/ -I- ^)(x) = f(x) + g(x) and multiplication by (af)(x) = <xf(x). A 

Exercise 2.6.1 asks you to show that this space satisfies all eight requirements 
for a vector space. 

The vector space C(0, 1) cannot be identified with R n ; there is no linear 
transformation from any R n to this space that is onto, as we will see in detail in 
Example 2.6.20. But it has subspaces that can be identified with appropriate 
R n, s, as seen in Example 2.6.3, and also subspaces that cannot. 

Example 2.6.3 (A finite-dimensional subspace of C(0, 1)). Consider the 
space of twice differentiable functions / : R — > R such that D 2 f = 0 (i.e., 
functions of one variable whose second derivatives are 0; we could also write 
this f" — 0). This is a subspace of the vector space of Example 2.6.2, and is 
a vector space itself. But since a function has a vanishing second derivative if 
and only if it is a polynomial of degree at most 1, we see that this space is the 
set of functions 


fa,b( x ) = a + te. 2.6.1 

Precisely two numbers are needed to specify each element of this vector space; 
we could choose as our basis 1 and x. 

On the other hand, the subspace C l (0, 1) c C(0, 1) of once continuously 
differentiable functions on (0, 1) also cannot be identified with any R N ; the 
elements are more restricted than those of C(0, 1), but not enough so that an 
element can be specified by finitely many numbers. 

Linear transformations 

In Sections 1.2 and 1.3 we investigated linear transformations R n — ► R m . Now 
we wish to define linear transformations from one (abstract) vector space to 
another. 
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Equation 2.6.2 is a shorter way 
of writing both 

T - (v 1 +v J ) = T(y 1 )+r(v 2 ) 

and 

Tiavt) = orT(y,). 


In order to write a linear trans- 
formation from one abstract vec- 
tor space to another as a ma- 
trix, you have to choose bases: 
one in the domain, one in the 
range. As long as you are in finite- 
dimensional vector spaces, you can 
do this. In infinite-dimensional 
vector spaces, bases usually do not 
exist. 


In Example 2.6.6, C[ 0, 1] is the 
space of continuous real-valued 
functions f(x) defined for 0 < * < 
1 . 

The function g in Example 2.6.6 
is very much like a matrix, and 
the formula for T g looks a lot like 
Yl9i,jfj • This is the kind of thing 
we meant above when we referred 
to “analogs” of matrices; it is as 
much like a matrix as you can hope 
to get in this particular infinite di- 
mensional setting. But it is not 
true that all transformations from 
C[ 0, 1] to C[ 0, 1] are of this sort; 
even the identity cannot be writ- 
ten in the form T g . 


Definition 2.6.4 (Linear transformation). If V and W are vector spaces, 
a linear transformation T :V -* W is a mapping satisfying 

T(avj 4- /3v 2 ) = ttT(vj) 4- (3T(\ 2 ) 2.6.2 

for all scalars a, (3 € R and all Vj, v 2 € V. 

In Section 1.3 we saw that every linear transformation T : R m —* R n is given 
by the matrix in Mat (n,m) whose ith column is T(e<) (Theorem 1.3.14). This 
provides a complete understanding of linear transformations from M m to R n . 

In the setting of more abstract vector spaces, linear transformations don’t 
have this wonderful concreteness. In finite-dimensional vector spaces, it is still 
possible to understand a linear transformation as a matrix but you have to work 
at it; in particular, you must choose a basis for the domain and a basis for the 
range. (For infinite-dimensional vector spaces, bases usually do not exist, and 
matrices and their analogs are usually not available.) 

Even when it is possible to write a linear transformation as a matrix, it may 
not be the easiest way to deal with things, as shown in Example 2.6.5. 

Example 2.6.5 (A linear transformation difficult to write as a matrix). 
If A e Mat(n,n), then the transformation Mat(n,n) — *• Mat(n,n) given by 
H n-* AH 4- HA is a linear transformation, which we encountered in Example 
1.7.15 as the derivative of the mapping S : A*~* A 2 : 

[. DS(A))H = AH + HA. 2.6.3 

Even in the case n = 3 it would be difficult, although possible, to write this 
transformation as a 9 x 9 matrix; the language of abstract linear transformations 
is more appropriate. A 

Example 2.6.6 (Showing that a transformation is linear). Let us show 
that if 9 ( y ) is a continuous function on [0, 1] x [0, 1), then the transformation 
T g : C[ 0, 1] -> C[ 0, 1] given by 

CW))(*) = jT g(y)f(y)dy 2.6.4 

is a linear transformation. For example, if g ^ ^ ^ = \x — y\, then we would have 

the linear transformation (T g (f))(x) = \x - y\f{y)dy. 

To show that Equation 2.6.4 is a linear transformation, we first show that 

T 9 (fl+f2)=Tg(f 1 )+Tg(f 2 ) } 


which we do as follows: 


2.6.5 
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The linear transformation in 
Example 2.6.7 is a special kind of 
linear transformation, called a lin- 
ear differential opeintor. Solving 
a differential equation is same as 
looking for the kernel of such a 
linear transformation. The coeffi- 
cients could be any functions of x , 
so it’s an example of an important 
class. 


definition of addition 
in vector space 

(r,(/i +/a))(*) = jf g(l)(fi + f 2 )(y)dy = J o <?(*) (fi(y) + Mv)) d y 

-jf {n{ X y)fM + 9{ X y)My))dy 

= J! 9 ( T y) My)dy + l o{ X y)M y)dy 

= {Tg(h))(x) + (T g (h))(x) = (T g (fi) + Tg(h))(x). 2.6.6 

Next we show that T g (af)(x) = aT g (f)(x): 

T g (af)(x) = J 9 [y)(<*f)(y)dy = aJ^ g f(y)dy = aT g {f)(x). 2.6.7 


Example 2.6.7 (A linear differential operator). The transformation T : 
C 2 (R) — * C(R) given by the formula 

(T(f))(x) = (i r 2 + 1 )f"(x) - xf'(x) + 2 /(*) 2.6.8 

is a linear transformation, as Exercise 2.6.2 asks you to show. 

Linear independence, span and bases 

In Section 2.4 we discussed linear independence, span and bases for R n and 
subspaces of R n . Extending these notions to arbitrary real vector spaces re- 
quires somewhat more work. However, we will be able to tap into what we have 
already done. 

Let V be a vector space and let {y} = y l , . . . , y m be a finite collection of 
vectors in V. 

Definition 2.6.8 (Linear combination). A linear combination of the 
vectors Vj, . . . , y m is a vector v of the form 

m 

Y = ^a*Vj, with ai,...,Om €R. 2.6.9 

i=l 


Definition 2.6.9 (Span). The collection of vectors {y} = v 1} . . . , v m spans 
V if and only if all vectors of V are linear combinations of v l5 . . . ,v m . 
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Definition 2.6.10 (Linear independence). The vectors axe 

linearly independent if and only if any one (hence all) of the following three 
equivalent conditions are met: 

( 1 ) There is only one way of writing a given linear combination; i.e., if 

m m 

= uBpltes a l ~ &1» 02 = &2> • • • i 2.6.10 

i=l t=l 

(2) The only solution to 

aiVj + a 2 y_ 2 H f* Y m = 0 is ai = 02 = ■• * = a m = 0. 2.6.11 


(3) None of the is a linear combination of the others. 


Definition 2 . 6.11 (Basis). A set of vectors Yi > • • • > Y m € V is a basis of V 
if and only if it is linearly independent and spans V. 


The following definition is central. It enables us to move from the concrete 
world of M n to the abstract world of a vector space V. 


The concrete to abstract func- 
tion ${v} (“Phi v”) takes a col- 
umn vector a 6 and gives an 
abstract vector v 6 V. 


Definition 2.6.12 (“Concrete to abstract” function ${ 3 }). Let V be 
a vector space, and let {y} = Yi> • • • ,Ym be a finite collection of vectors in 
V. The “concrete to abstract” function ${y} is the linear transformation 
${y> : R m — ► V given the formula 


a x 


L dm J 


= fllYl + ‘ ‘ + a m V 


m‘ 


2.6.12 


Example 2.6.13 (Concrete to abstract function). Let P 2 be the space of 
polynomials of degree at most 2, and consider its basis v x = 1, y 2 = x, V3 = x 2 . 


Then ${y} 


d\ 

02 


= ai + a 2 x + a$x 2 identifies P 2 with IR 3 . A 


L 


03 J 


Example 2.6.14 (Tb interpret a column vector, the basis matters). If 
V = K 2 and e is the standard basis, then 


* { «> ( ft )= ft > *{«)( ft )=“ 


oei + be 2 — 


a 

b 


2.6.13 


(If V = R n , and {e} is the standard basis, then is always the identity.) 
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Choosing a basis is analogous 
to choosing a language. A lan- 
guage gives names to an object or 
an idea; a basis gives a name to 
a vector living in an abstract vec- 
tor space. A vector has many em- 
bodiments, just as the words book, 
livre, Buck ... all mean the same 
thing, in different languages. 

In Example 2.6.14, the func- 
tion $ { v> is given by the matrix 

* j ; in this case, both the 
domain and the range are R 2 . 

You were asked to prove Propo- 
sition 2.6.15 in the context of sub- 
spaces of R n , in Exercise 2.4.9. 


Why study abstract vector 
spaces? Why not just stick to R n ? 
One reason is that R n comes with 
the standard basis, which may not 
be the best basis for the problem 
at hand. Another is that when 
you prove something about R n , 
you then need to check that your 
proof was “basis independent” be- 
fore you cam extend it to an arbi- 
trary vector space. 

Exercise 2.4.9 asked you to 
prove Proposition 2.6.15 when V 
is a subspace of R m . 


If instead we used the basis vj = 

* U ) ([&]) =“v,+6v 2 = 

a ^ J I in the standard basis. A 
a — b J 

Proposition 2.6.15 says that if {v} is a basis of V, then the linear transfor- 
mation 3>{v} : R m — ► V allows us to identify R n with V", and replace questions 
about V with questions about the coefficients in R n ; any vector space with a 
basis is “just like” R n . A look at the proof should convince you that this is just 
a change of language, without mathematical content. 


a 

b 


in the new basis equals 


a + b 
a - b 


2.6.14 


1 


1 

1 

,V 2 = 

-1 


, then 


Proposition 2.6.15 (Linear independence, span, basis). If V is & 

vector space , and {v} = v 1? . . . , v n are vectors in V, then: 

( 1 ) The set {v} is linearly independent if and only if JS one one - 

(2) The set { v} spans V if and only if is onto. 

(3) The set {v} is a basis of V if and only if is one to one and onto 
(i.e.j invertible). 


When {v} is a basis, then is the “abstract to concrete” transformation. 
It takes an element in V and gives the ordered list of its coordinates, with 
regard to the basis {v}. While synthesizes, decomposes: taking the 
function of Example 2.6.13, we have 



02 

03 


= ai + a 2 * + a-iX 2 


+ o 2 * + a 3 x 2 ) = 


01 

02 
03 


2.6.15 


Proof. (1) Definition 2.6.10 says that v l5 . . . , v n are linearly independent if 

m m 

^2 ai -i = bi -* im P lies Ol = &1 , 02 = 62 , ... , a m = b m . 2.6.16 

1=1 i= 1 

That is exactly saying that ${y}(**) — ^{v}(b) if and only if a = b, i.e., that 
* { v> is one to one. 
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The use of ${y> and its in- 
verse to identify an abstract vector 
space with is very effective but 
is generally considered ugly; work- 
ing directly in the world of ab- 
stract vector spaces is seen as more 
aesthetically pleasing. We have 
some sympathy with this view. 


Exercise 2.6.3 asks you to show 
that in a vector space of dimen- 
sion n, more than n vectors are 
never linearly independent, and 
fewer than n vectors never span. 


How do we know that and 

4^} exist? By Proposition 2.6.15, 
the" fact that (v) and {w} are 
bases means that 4>{y} and 
are invertible. 

It is often easier to understand 
a composition if one writes it in 
diagram form, as in 

R* — V — R p , 

<V> v {w) 

in Equation 2.6.18. When writing 
this diagram, one reverses the or- 
der, following the order in which 
the computations are done. 

Equation 2.6.18 is the general 
case of Equation 2.4.30, where we 
showed that any two bases of a 
subspace of R n have the same 
numher of elements. 


(2) Definition 2.6.9 says that {v} = Y!,...,v m span V if and only if all 
vectors of V are linear combinations of v p . . . , v m ; i.e., any vector Y € V can 

be written 

v — ai^ + 1- a n v n = ${v}(a). 2.6.17 

In other words, ${y} is onto. 

(3) Putting these together, Vj, . . . , v n is a basis if and only if it is linearly 
independent and spans V, i.e., if ${y) is one to one and onto. □ 


The dimension of a vector space 

The most important result about bases is the following statement. 

Theorem 2.6.16. Any two bases of a vector space have the same number of 
elements. 

The number of elements in a basis of a vector space V is called the dimension 
of V, denoted dim: 

Definition 2.6.17 (Dimension of a vector space). The dimension of a 
vector space is the number of elements of a basis of that space. 


Proof of Theorem 2.6. 16. Let { v} and { w} be two bases of a vector space 
V: {v} the set of k vectors v,, . . . , y*, so that ${y} is a linear transformation 
from R* to V y and {w} the set of p vectors w„... , w p , so that is a linear 
transformation R p to V. Then the linear transformation 


*<a> 0 *<s> : 
- 


W (i.e. R' 


change of basis matrix 


* 


{v> 


*S> 


R p ), 


2.6.18 


is invertible. (Indeed, we can undo the transformation, using °^{ 3E }-) But 
it is given by an p x k matrix (since it takes us from R* to R p ), and we know 
that a matrix can be invertible only if it is square. Thus k = p. □ 


Remark. There is something a bit miraculous about this proof; we are able 
to prove an important result about abstract vector spaces, using a matrix that 
seemed to drop out of the sky. Without the material developed earlier in this 
chapter, this result would be quite difficult to prove. The realization that the 
dimension of a vector space needed to be well defined was a turning point in 
the development of linear algebra. Dedekind's proof of this theorem in 1893 
was a variant of row reduction. A 
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With our definition (Definition 
2.6.11), a basis is necessarily finite, 
but we could have allowed infinite 
bases. We stick to finite bases be- 
cause in infinite-dimensional vec- 
tor spaces, bases tend to be use- 
less. The interesting notion for 
infinite-dimensional vector spaces 
is not expressing an element of 
the space as a linear combination 
of a finite number of basis ele- 
ments, but expressing it as a lin- 
ear combination that uses infin- 
itely many basis vectors i.e., as an 
infinite series (for ex ‘ 

ample, power series or Fourier se- 
ries). This introduces questions of 
convergence, which are interesting 
indeed, but a bit foreign to the 
spirit of linear algebra. 

It is quite surprising that there 
is a one to one and onto map from 
R to C[0, 1); the infinities of ele- 
ments they have are not different 
infinities. But this map is not lin- 
ear. Actually, it is already surpris- 
ing that the infinities of points in 
R and in R 2 are equal; this is il- 
lustrated by the existence of Peano 
curves, described in Exercise 0.4.5. 
Analogs of Peano curves can be 
constructed in C[0, 1]. 


Example 2.6.18 (Change of basis). Let us see that the matrix A in the 
proof of Proposition and Definition 2.4.20 (Equation 2.4.28) is indeed the change 
of basis matrix 


*<i ° ♦«*) : 


R , 


2.6.19 


expressing the new vectors (the w’s) in terms of the old (the v’s.) 
Like any linear transformation R p — » R fc 

a i,j 

a k x p matrix A whose jth column : 

_ a k,j 


, the transformation ^ ias 


is A(ej). This means 


<h.j 


Qk,3 


= A(Sj) = 4> f v) ° $ (w}(ej) = 


“1 fJT. 


or, multiplying the first and last term above by 

a i,j 


$ 


{ V } 


a*,! 


= aijvj + • • • + afcjVfc = Wj. A 


2 . 6.20 


2 . 6.21 


Example 2.6.19 (Dimension of vector spaces). The space Mat (n,m) is a 
vector space of dimension nm. The space P k of polynomials of degree at most 
k is a vector space of dimension k + 1 . A 

Earlier we talked a bit loosely of “finite-dimensional” and “infinite-dimen- 
sional” vector spaces. Now we can be precise: a vector space is finite dimen- 
sional if it has a finite basis, and it is infinite dimensional if it does not. 

Example 2.6.20 (An infinite-dimensional vector space). The vector 
space C[0, 1] of continuous functions on [0, 1], which we saw in Example 2.6.2, 
is infinite dimensional. Intuitively it is not hard to see that there are too many 
such functions to be expressed with any finite number of basis vectors. We can 
pin it down as follows. 

Assume functions f\ , . . . , /„ are a basis, and pick n + 1 distinct points 0 = 
X\ < X 2 "- < x n +i = 1 in [0,1]. Then given any values ci,...,c n+ i, there 
certainly exists a continuous function f(x) with f(xi) = c t , for instance, the 
piecewise linear one whose graph consists of the line segments joining up the 
points (*•). 

If we can write / = a*/*, then evaluating at the X{, we get 

n 

f(xi) = Ci = a k f k (xi ), i = 1 n + 1. 2.6.22 

k= 1 
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This, for given c*’s is a system of + 1 equations for the n unknowns a t , . . . , a*; 
we know by Theorem 2.2.4 that for appropriate c*’s the equations will be in- 
compatible. Therefore there are functions that are not linear combinations of 
/i,... so /i,...,/n do not span C[0, 1]. 


2.7 Newton’s Method 


Recall that the derivative 
(Df(oo)l is a matrix , the Jacobian 
matrix, whose entries are the par- 
tial derivatives of f at ao- The in- 
crement to the variable, x — ao, is 
a vector. 


When John Hubbard was teaching first year calculus in France in 1976 , 
he wanted to include some numerical content in the curriculum. Those 
were the early days of programmable calculators; computers for under- 
graduates did not then exist. Newton's method to solve cubic polynomials 
just about fit into the 50 steps of program and eight memory registers 
available, so he used that as his main example. Writing the program was 
already a problem, but then came the question of the place to start: what 
should the initial guess be? 

At the time he assumed that even though he didn't know where to start, 
the experts surely did; after all, Newton's method was in practical use all 
over the place. It took some time to discover that no one knew anything 
about the global behavior of Newton's method. A natural thing to do was 
to color each point of the complex plane according to what root (if any) 
starting at that point led to. (But this was before the time of color screens 
and color printers: what he actually did was to print some character at 
every point of some grid: x ’ s and 0 's, for example.) 

The resulting printouts were the first pictures of fractals arising from 
complex dynamical systems, with its archetype the Mandelbrot set. 


We put an arrow over f to in- 
dicate that elements of the range 
of f are vectors; ao is a point and 
f(ao) is a vector. In this way f is 
like a vector held, taking a point 
and giving a vector. But whereas 
a vector field f takes a point in 
one space and turns it into a vector 
in the same space, the domain and 
range of f can be different spaces, 
with different units. The only re- 
quirement is that there must be as 
many equations as unknowns: the 
dimensions of the two spaces must 
be equal. Newton’s method has 
such wide applicability that being 
more precise is impossible. 


Theorem 2.2.4 gives a quite complete understanding of linear equations. In 
practice, one often wants to solve nonlinear equations. This is a genuinely hard 
problem, and when confronted with such equations, the usual response is: apply 
Newton’s method and hope for the best. 

Let f be a differentiable function from IR n (or from an open subset of IR n ) 
to 3R n . Newton’s method consists of starting with some guess ao for a solution 
of f(x) = 0. Then linearize the equation at ao: replace the increment to the 
function, f(x) - f(ao), by a linear function of the increment, [Df(ao)](x - ao). 
Now solve the corresponding linear equation: 

f(ao) + [Df(ao)](x - ao) = 0. 2.7.1 

This is a system of n linear equations in n unknowns. We can rewrite it 
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Note that Newton’s method re- 
quires inverting a matrix, which is 
a lot harder than inverting a num- 
ber; this is why Newton’s method 
is so much harder in higher dimen- 
sions than in one dimension. 

In practice, rather than find 
the inverse of (Df(ao)), one solves 
Equation 2.7.1 by row reduction, 
or better, by partial row reduction 
and back substitution, discussed 
in Exercise 2.1.9. When applying 
Newton’s method, the vast major- 
ity of the computational time is 
spent doing row operations. 


How do you come by your ini- 
tial guess ao? You might have a 
good reason to think that nearby 
there is a solution, for instance be- 
cause Jf(ao)| is small; we will see 
many examples of this later: in 
good cases you cam then prove that 
the scheme works. Or it might 
be wishful thinking: you know 
roughly what solution you want. 
Or you might pull your guess out 
of thin air, and start with a collec- 
tion of initial guesses ao, hoping 
that you will be lucky and that at 
least one will converge. In some 
cases, this is just a hope. 


Remember that if a matrix A has an inverse A" 1 , then for any b the equation 
Ax = b has the unique solution .4 -1 b, as discussed in Section 2.3. So if (Df(ao)] 
is invertible, which will usually be the case, then 

x = ao - [Df(ao)] -1 f(ao); 2.7.3 

Call this solution ai , use it as your new “guess,” and solve 

[Df(a,)](x - a,) = -f( ai ), 2.7.4 

calling the solution a 2 , and so on. The hope is that ai is a better approximation 
to a root than ao, and that the sequence ao,aj, . . . converges to a root of the 
equation. This hope is sometimes justified on theoretical grounds, and actually 
works much more often than any theory explains. 

Example 2.7.1 (Finding a square root). How do calculators compute 
the square root of a positive number 6? They apply Newton’s method to the 
equation f(x) = x 2 — 6 = 0. In this case, this means the following: choose 
ao and plug it into Equation 2.7.2. Our equation is in one variable, so we can 
replace [Df(ao)] by /'(ao) = 2a 0 , as shown in Equation 2.7.5. 

This method is sometimes introduced in middle school, under the name divide 
and average. 



Newton’s method divide and average 


(Exercise 2.7.3 asks you to find the corresponding formula for nth roots.) 

The motivation for divide and average is the following: let a be a first guess 
at y/b. If your guess is too big, i.e., if a > \/6, then b/a will be too small, and the 
average of the two will be better than the original guess. This seemingly naive 
explanation is quite solid and can easily be turned into a proof that Newton’s 
method works in this case. 

Suppose first that a 0 > Vb; then we want to show that y/b < a\ < ao- Since 
a i — |(oo 4- b/ao), this comes down to showing 



<n 


or, if you develop, 46 < a$ 26 + < 4ag. To see the left-hand inequality, 

subtract 46 from each side: c 

62 jl2 / » \ 2 

o§ + 26+-j-46 = ag-2fc+2 5 = (a 0 -- ) 

°o % \ ao/ 


> 0 . 


2.7.7 
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Two theorems from first 
year calculus. 

(1) If a decreasing sequence is 
bounded below, it converges (see 
Theorem 0.4.10). 

(2) If a n is a convergent se- 
quence, and / is a continuous 
function in a neighborhood of the 
limit of the a n , then 

lim/(a„) = /(lim a„). 



Figure 2.7.1. 


Newton’s method: each time 
we calculate a n +i from a n we are 
calculating the intersection with 
the x-axis of the line tangent to 
the parabola y - x 2 - 6 at a„. 


The right-hand inequality follows immediately from b < aft , hence b 2 /a 2 < a^: 

b 2 

al+ 26 + <4aJ. 2.7.8 

a ° . 

<2a| 


<al 


Recall from first year calculus (or from Theorem 0.4.10) that if a decreasing 
sequence is bounded below, it converges. Hence the a* converge. The limit a 
must satisfy 

a = lim aj+i = lim if a* + — J = ~ ( a + -} , i.e., a = Vb. 2.7.9 

i— oo *— oo 2 \ CL{ J 2 \ a) 

What if you choose 0 < ao < Vb ? In this case as well, aj > Vb: 

b 2 

4a^ < 46 < dj) + 26 + -j . 2.7.10 

a o, 

40 * 

We get the right-hand inequality using the same argument used in Equation 

2.7.7: 26 < aft + since subtracting 26 from both sides gives 0 < (ao — dL J . 
Then the same argument as above applies to show that 02 < ai- 

This “divide and average” method can be interpreted geometrically in terms 
of Newton’s method: Each time we calculate On+i from we are calculating 
the intersection with the x-axis of the line tangent to the parabola y = x 2 - 6 
at On , as shown in Figure 2.7.1. 

There aren’t many cases where Newton’s method is really well understood 
far away from the roots; Example 2.7.2 shows one of the problems that can 
arise, and there are many others. 


Example 2.7.2 (A case where Newton’s method doesn’t work). Let’s 
apply Newton’s method to the equation 

3 V2 n 

* -x + — =0, 2.7.11 

starting at x = 0 (i.e., our “guess” ao is 0). The derivative /'(x) = 3x 2 - 1, so 
m — “L and /( 0) = \/2/2, giving 


--7 

Since ai = v/2/2, we have /'(ai) = 1/2, and 


2.7.12 



/ V 2_>/2 
V 4 2 + 2 ) 


= 0 . 


2.7.13 
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Don’t be too discouraged by 
this example. Most of the time 
Newton’s method does work. It 
is the best method available for 
solving nonlinear equations. 


We’re back to where we started, at flo = 0. If we continue, we 11 bounce back 
and forth between ^ and 0, never converging to any root: 


o-f- 0 "" 


Now let’s try starting at some small c > 0. We have /'(f) — 3e 2 
/(e) = e 3 - e 4- y/2/2 , giving us 

“■ = e -5^rr( f3_£ + T) = e+ ( e3 - £ + T)rT3?- 


2.7.14 
1, and 

2.7.15 


This uses Equation 0.4.9 the 
sum of a geometric series: If (r| < 
1, then 


DO 




a 

l^r‘ 


We can substitute 3e 2 for r in that 
equation because e 2 is small. 


Now we can treat 
1 


- 318 the sum of the geometric series (14- 3e 2 -f 9e 4- . . . ). 

1 3e 


This gives us 

a, = *+(e 3 -e + ~Vl + 3€ 2 + 9e 4 + ...). 2.7.16 

Now we just ignore terms that are smaller than c 2 , getting 


Remember that we said in the 
introduction to Section 1.4, that 
calculus is about ** . . . about some 
terms being dominant or negli- 
gible compared to other terms.” 
Just because a computation is 
there doesn’t mean we have to do 
it to the bitter end. 


,V2 

a x - c 4- 


V 2 
y/2 3n/2c 2 


e^(l 4- 3c 2 ) 4- remainder 
4- remainder. 


2 2 

Ignoring the remainder, and repeating the process, we get 

_ v/2 3y/2c 2 (^ + 3#< 2 ) 3 -(^ + 3#e 2 ) + ^ 

2 + 2 3(^ + ^2(2)2 _ i 

This looks unpleasant; let’s throw out all the terms with e 2 . We get 


02 


2.7.17 


2.7.18 


02 


2 n4y -i 2 i 22 


If we continue, we’ll bounce be- 
tween a region around ^ and a 
region around 0, getting closer and 
closer to these points each time. 


a 2 = 0 4- ce 2 , where c is a constant. 

2.7.19 

We started at 0 4* c and we’ve been sent back to 0 4- ce 2 ! 

We’re not getting anywhere; does that mean there are no roots? Not at all. 17 
Let’s try once more, with ao = -1. We have 


a o~°o + ^_ 2 a o~ i r 

1 “ °° ~ 3^ 1 " 3ag - 1 ’ 


2.7.20 


l7 Of course not. AH odd-degree polynomials have real roots by the intermediate 
value theorem, Theorem 0.4.12 
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As we said earlier, the reason 
Newton’s method has become so 
important is that people no longer 
have to carry out the computa- 
tions by hand. 


Any statement that guarantees 
that you can find solutions to non- 
linear equations in any generality 
at all is bound to be tremendously 
important. In addition to the im- 
mediate applicability of Newton’s 
method to solutions of all sorts 
of nonlinear equations, it gives a 
practical algorithm for finding im- 
plicit and inverse functions. Kan- 
torovitch’s theorem then gives a 
proof that these algorithms actu- 
ally work. 


A computer or programmable calculator can be programmed to keep iterating 
this formula. It’s slightly more tedious with a simple scientific calculator; with 
the one the authors have at hand, we enter “1 +/— Min” to put -1 in the 
memory (“MR”) and then: 

(2 x MR x MR x MR- 2y div 2) div(3 x MR x MR - 1). 

We get ai = -1.35355. • . ; entering that in memory by pushing on the “Min” 
(or “memory in”) key we repeat the process to get: 

a 2 = -1.26032... a 4 = -1.25107. . . 2 721 

a 3 = -1.25116 ... a 5 = -1.25107 .... 

It’s then simple to confirm that as is indeed a root, to the limits of precision of 
the calculator or computer. A 

Does Newton’s method depend on starting with a lucky guess? Luck some- 
times enters into it; with a fast computer one can afford to try out several 
guesses and see if one converges. But, you may ask, how do we really know 
that solutions are converging? Checking by plugging in a root into the equa- 
tion isn’t entirely convincing, because of round-off errors. We shall see that 
we can say something more precise. Kantorovitch’s theorem guarantees that 
under appropriate circumstances Newton’s method converges. Even stating the 
theorem is difficult. But the effort will pay off. 

Lipschitz conditions 

Imagine an airplane beginning its approach to its destination, its altitude rep- 
resented by /. If it loses altitude gradually , the derivative /' allows one to 
approximate the function very well; if you know how high the airplane is at the 
moment t, and what its derivative is at £, you can get a good idea of how high 
the airplane will be at the moment t + h: 

f(t + h) « f(t) + 2.7.22 

But if the airplane suddenly loses power and starts plummeting to earth, the 
derivative changes abruptly: the derivative of / at t will no longer be a reliable 
gauge of the airplane’s altitude a few seconds later. 

The natural way to limit how fast the derivative can change is to bound the 
second derivative; you probably ran into this when studying Taylor’s theorem 
with remainder. In one variable this is a good idea. If you put an appropriate 
limit to /" at t, then the airplane will not suddenly change altitude. Bound- 
ing the second derivative of an airplane’s altitude function is indeed a pilot’s 
primary goal, except in rare emergencies. 

To guarantee that Newton’s method starting at a certain point will converge 
to a root, we will need an explicit bound on how good an approximation 

[Df(xo)]h is to f(xo + h) - f(xo). 


2.7.23 
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In Definition 2.7.3, U can be 
a subset of R n ; the domain and 
the range of f do not need to have 
the same dimension. But when 
we use this definition in the Kan- 
torovitch theorem, those dimen- 
sions will have to be the same. 


As in the case of the airplane, to do this we will need some assumption on how 
fast the derivative of f changes. 

But in several variables there are lots of second derivatives, so bounding the 
second derivative doesn’t work so well. We will adopt a different approach: 
demanding that the derivative of f satisfy a Lipschitz condition. 

Definition 2.7.3 (Lipschitz condition). Let f : U -* R m be a differen- 
tiable mapping. The derivative [Df(x)J satisfies a Lipschitz condition on a 
subset V CU with Lipschitz ratio M if for all x, y € V 


||Df(x)] - |Df(y)) | < M |x-y| 

distance . distance 

between deriv. between points 


2.7.24 


A Lipschitz ratio tells us some- 
thing about how fast the deriva- 
tive of a function changes. 

It is often called a Lipschitz 
constant. But M is not a true con- 
stant; it depends on the problem 
at hand; in addition, a mapping 
will almost always have different 
M at different points or on differ- 
ent regions. When there is a sin- 
gle Lipschitz ratio that works on 
all of R n , we will call it a global 
Lipschitz ratio. 

Example 2.7.4 is misleading: 
there is usually no Lipschitz ratio 
valid on the entire space. 


Note that a function whose derivative satisfies a Lipschitz condition is cer- 
tainly continuously differentiable. Having the derivative Lipschitz is a require- 
ment that the derivative is especially nicely continuous (it is actually close to 
demanding that the function be twice continuously differentiable). 


Example 2.7.4 (Lipschitz ratio: a simple case). Consider the mapping 
f : R 2 R 2 


-x\ 


+ x 2 


Given two points x and y, 

Hx‘)] - H&)) = 


) with derivative [ot (§■)] = f* 


2.7.25 


0 — 2(x 2 - y 2 ) 

[ 2(x, — 2 / 1 ) 0 

Calculating the length of the matrix above gives 

0 -2(x 2 - 1 / 2 ) 


2.7.26 


2(x! - yi ) 


0 


= 2 v / (x, - 


y\) 2 + (x 2 - y 2 ) 2 = 2 


*i -2/1 

x 2 -y 2 


so 


|(Df(x)) - [Df(y)]j = 2|x - y|; 
in this case M — 2 is a Lipschitz ratio for (Df). 


2.7.27 


Example 2.7.5 (Lipschitz ratio: a more complicated case). Consider 
the mapping f : R 2 -♦ R 2 given by 

f (%) = - "****»*" h ^)] = [ 3 * 5 ~\ xl } . 2.7.28 


+ x 2 

Given two points x and y we have 


Ms)] -Ns)]- [«/,?) ~ 3(I r y|) 


2.7.29 
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and taking the length gives 



Figure 2.7.2. 
Equations 2.7.31 and 2.7.32 say 

that when ^ * j ^ and ^ ^ ^ are 

in the shaded region above, then 
ZA is a Lipschitz ratio for [Df(x)j. 
It is not immediately obvious how 
to translate this statement into a 
condition on the points 

*“(») and *-(»)• 


|[ Df (S)l-[ Df (S)ll 

= 3v/(xi - yi) 2 (xi + yi) 2 + (x 2 - Jte) 2 (z 2 + Vt) 2 - 


2.7.30 


Therefore, when 

(xi 4- J/i) 2 < ft? and ( X2 + jte) 2 £ ft? ♦ 2.7.31 


as shown in Figure 2.7.2, we have 


lfw(ii)l-M£)lUM|(S)-(Si)l- 2J - 32 


i.e., ZA is a Lipschitz ratio for [Df(x)]. 

When is the condition of Equation 2.7.31 satisfied? It isn’t really clear that 
we need to ask this question: why can’t we just say that it is satisfied when it 
is satisfied; in what sense can we be more explicit? There isn’t anything wrong 
with this view, but the requirement in Equation 2.7.31 describes some more or 
less unimaginable region in R 4 . (Keep in mind that Equation 2.7.32 concerns 
points x with coordinates X\ , X2 and y with coordinates y\ , 3 / 2 » not the points of 
Figure 2.7.2, which have coordinates x\,y\ and x 2 ,j /2 respectively.) Moreover, 
in many settings, what we really want is a ball of radius R such that when two 
points are in the ball, the Lipschitz condition is satisfied: 


|[Df(x)J - [Df(y)]| < 3j4|x - y| when |x| < R and |y| < R. 2.7.33 
If we require that |x| 2 = x 2 + x\ < ft ? /4 and |y| 2 = y* + y% < ft ? then 


By sup{(a:i + yx) 2 y fa + P 2 ) 2 } 

we mean “the greater of (xi + y \ ) 2 
and (X 2 + J/ 2 ) 2 .” In this compu- 
tation, we are using the fact that 
for any two numbers a and 6, we 
always have 

(o + 6) 2 < 2(a J + b 2 ), 

since 0 < (a — 6) 2 . 


sup{(u +yi) J ,(x 2 +j/ 2 ) 2 } < 1(x\+y\+ x l+yl) = 2(|x| 2 + |y| 2 ) < A 2 . 2.7.34 
Thus we can assert that if 

W,|y|<|, then |[Df(x)|-[Df(y)]|<3A|x-y|. A 2.7.35 

Computing Lipschitz ratios using higher partial derivatives 

Most students can probably follow the computation in Example 2.7.5 line by 
line, but even well above average students will probably feel that the tricks 
used are way beyond anything they can be expected to come up with on their 
own. Finding ratios M as we did above is a delicate art, and finding M’s 
that are as small as possible is harder yet. The manipulation of inequalities is 
a hard skill to acquire, and no one seems to know how to teach it very well. 
Fortunately, there is a systematic way to compute Lipschitz ratios, using higher 
partial derivatives. 

Higher partial derivatives are essential throughout mathematics and in sci- 
ence. Mathematical physics is essentially the theory of partial differential equa- 
tions. Electromagnetism is based on Maxwell’s equations, general relativity 
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Originally we introduced higher 
partial derivatives in a separate 
section in Chapter 3. They are so 
important in all scientific applica- 
tions of mathematics that it seems 
mildly scandalous to slip them in 
here, just to solve a computa- 
tional problem. But in our expe- 
rience students have such trouble 
computing Lipschitz ratios — each 
problem seeming to demand a new 
trick — that we feel it worthwhile 
to give a “recipe.” 

Different notations for partial 
derivatives exist: 

= /x,Xj(a). 

As usual, we specify the point a at 
which the derivative is evaluated. 


In Example 2.7.7 we evaluated 
the partial derivatives of / at both 

M /x\ 

I b I and at I y I to emphasize 

the fact that although we used x, y 
and z to define /, we can evaluate 
it on variables that look different. 

Recall that a function is C 2 
if it is twice continuously differ- 
entiable, i.e., if its second partial 
derivatives exist and are continu- 
ous. 


on Einstein’s equation, fluid dynamics on the Navier-Stokes equation, quan- 
tum mechanics on Schrodinger’s equation. Understanding partial differential 
equations is an prerequisite for any serious study of these phenomena. Here, 
however, we will use them as a computational tool. 

Definition 2.7.6 (Second partial derivative). Let U C M n be open, 
and / : U — ► WL be a differentiable function. If the function D\f is itself 
differentiable, then its partial derivative with respect to the jth variable, 

Dj(Dif), 

is called a second partial derivative of /. 


Example 2.7.7 (Second partial derivative). Let / be the function 



= 2x + xy 3 + 2 yz 2 . 


a 


Then D 2 {Dif) b 


D 2 (2 + b 3 ) = 36 2 . 
~Di 7 


Similarly, D 3 (D 2 f) 



= D 3 (3 xy 2 + 2 z 2 ) — 4z. 


D 2 f 


A 


We can denote D^DJ) by Df/, D 2 (D 2 f) by D|/, .... For the function 
f (y) = X V 2 + sinx, what are D\f, D%f, D l (D 2 f) y and D 2 (D x f) ? 18 

Proposition 2.7.8 says that the derivative of f is Lipschitz if f is of class C 2 . 


Proposition 2.7.8 (Derivative of a C 2 mapping is Lipschitz). Let 

u C R n be open, and f : U -> R n be a C 2 mapping. lf\D k Djfi(x)\ < a tjtk 
for ail triples of indices 1 < i,j, k <n, then 


|[Df(u)] - [Df(v)]| < 


1/2 


22 to .**) 2 i u - v |. 




2.7.36 


Proof. Each of the Djfi is a scalar- valued function, and Corollary 1.9.2 tells 
us that 


IS 2 

D if — y + cosx and D 2 f - 2 xy, so 

D\f = D x {y + cos x) = - sin x, D\f = D 2 (2xy) = 2x, 

— D\(2xy) — 2 y, and D 2 (Dif) = D 2 [y 2 -f cosx) = 2 y. 
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Equation 2.7.37 uses the fact 
that for any function g (in our 
case, 


n 


1/2 


g ( a + h) - g(a) 


\Djfi(a + h) - Djfi( a)| < ( ^tej,*) 2 1 M’ 

dfe-i 


< ^sup|(Dp(a + th)]|^ | h|; 


By definition 


1/2 


n 


remember that 
||D 9 (a + «h)|| 


|[Df(a + h)] - [Df(a)]| = [ £ (Djf ;(a + h) - D,f x ( a))' 

*J = 1 


\2 




So |[Df(a + h)] - [Df(a)]| < 


f ± ( (b^) 2 V /2 m 

i,j = l \\fc=l 

1/2 

E fe .*) 2 ) w - 

l<t,_),fc< n 


2 \ >/2 


2.7.37 


2.7.38 


2.7.39 


, , 

The proposition follows by setting u = a + h, and v = a. □ 


Example 2.7.9 (Redoing Example 2.7.5 the easy way). Let’s see how 
much easier it is to find a Lipschitz ratio in Example 2.7.5 using higher partial 
derivataives. First we compute the first and second derivatives, for fi = x\ — x 2 
and f 2 = x? + x 2 : 

D x f = 1; D 2 f x = -3x 2 ; D x f 2 = 3x?; D 2 i 2 = 1- 2.7.40 


In Equation 2.7.41 we use the 
fact that crossed partials of f are 
equal (Theorem 3.3.9). 


This gives 

D\D\f\ — D X D 2 { X = D 2 D\f\ = 0; D 2 D 2 i\ = —6x 2 
D\D\i 2 = 6x1; D\D 2 i 2 — D 2 D\f 2 = 0; D 2 D 2 i 2 = 0. 


2.7.41 


So our Lipschitz ratio is >/36xf + 36x 2 = 6y/x% + x 2 : again we can assert that 
if 


|x|, |y| < B, then |[Df(x)] - [Df(y)]| < 6£|x - y|. A 2.7.42 

Using higher partial derivatives, recompute the Lipchitz ratio of Example 2.7.4. 
Do you get the same answer we did? 19 


10 The higher partied derivative method gives 2%/2; earlier we got 2, a better result. 
A blunderbuss method guaranteed to work in all cases is unlikely to give results as 
good as techniques adapted to the problem at hand. But the higher partied derivative 
method gives results that are good enough. 
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By fiddling with the trigonom- 
etry, one can get the v/86 down to 
\/^8 ~ 8.8, but the advantage of 
Proposition 2.7.8 is that it gives 
a systematic way to compute Lip- 
schitz ratios; you don’t have to 
worry about being clever. 

To go from the first to the sec- 
ond line of Equation 2.7.44 we use 
the fact that | sin | and | cos | are 
bounded by 1. 

The Kantorovitch theorem says 
that if certain conditions are met, 
the equation 

f(x) = 6 

has a unique root in a neighbor- 
hood Uo. In our airplane analogy, 
where is the neighborhood men- 
tioned? It is implicit in the Lip- 
schitz condition: the derivative is 
Lipschitz with Lipschitz ratio M 
in the neighborhood Uo- 


Remember that acceleration 
need not be a change in speed — it 
can also be a change in direction. 


Example 2.7.10 (Finding a Lipschitz ratio using second derivatives: a 
second example). Let us find a Lipschitz ratio for the derivative of F ^ y ^ = 

( S 'cos(xyf ) ’ for 1*1 < 2 ’ M < 2 ' We com P ute 

Di£>iFi = -sin(x -I- y ), D 2 D 1 F 1 - D 1 D 2 F 1 = -sin(xy), 

D 2 D 2 F l = - sin(x 4- y); 

D 1 D 1 F 2 = -y 2 cos(xy), D 2 D 1 F 2 — D\D 2 F 2 - -(sin(xy) 4- yxcos(xy)), 
D 2 D 2 F 2 = x 2 coaxy. 2.7.43 

This gives 

\I 4 sin 2 (x + y) + y 4 cos 2 xy + x 4 cos 2 xy 4* 2(sin xy 4- xy cos xy) 2 

2.7.44 

< \/4 4- y 4 -l- x 4 4- 2(1 + |xy|) 2 . 

So for |x| < 2, |y| < 2, we have a Lipschitz ratio M < y/A 4- 16 4- 16 4- 50 = 
\/86 < 9.3; i.e., 

|[Df(u)] - [Df(v))| < 9.3 |u - v|. A 2.7.45 


Kantorovitch’s theorem 

Now we are ready to tackle Kantorovitch’s theorem. It says that if the product 
of three quantities is < 1/2, then the equation f(x) = 0 has a unique root in 
a neighborhood C/ 0 , and if you start with initial guess ao in that neighborhood , 
Newton’s method will converge to that root. 

The basic idea is simple. The first of the three quantities that must be small 
is the value of the function at ao. If you are in an airplane flying close to 
the ground, you are more likely to crash (find a root) than if you are several 
kilometers up. The second quantity is the square of the inverse of the derivative 
of the function at ao- In one dimension, we can think that the derivative must 
be big. 20 If your plane is approaching the ground steeply, it is much more likely 
to crash than if it is flying almost parallel to the ground. 

The third quantity is the Lipschitz ratio M, measuring the change in the 
derivative (i.e., acceleration). If at the last minute the pilot pulls the plane out 
of a nose dive, some passengers or flight attendants may be thrown to the floor 
as the derivative changes sharply, but a crash will be avoided. 

0 Why the theorem stipulates the square of the inverse of the derivative is more 
subtle. We think of it this waj: the theorem should remain true if one changes the 
scale Since the “numerator” f(«o )M in Equation 2.7.48 contains two terms, scaling 
up will change it by the scale factor squared. So the “denominator” |[Df(ao)l -1 | 2 
must also contain a square. 
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Note that the domain and the 
range of the mapping f have the 
same dimension. In other words, 
setting f(x) = 0, we get the same 
number of equations as unknowns. 
This is a reasonable requirement. 
If we had fewer equations than un- 
knowns we wouldn’t expect them 
to specify a unique solution, and 
if we had more equations than un- 
knowns it would be unlikely that 
there will be any solutions at all. 

|n addition, if n ^ m, tben 
[Df(ao)| w f ould not be a square 
matrix, so it would not be invert- 
ible. 

The Kantorovitch theorem is 
proved in Appendix A. 2. 


But it is not each quantity individually that must be small: the product 
must be small. If the airplane starts its nose dive too close to the ground, even 
a sudden change in derivative may not save it. If it starts its nose dive from 
an altitude of several kilometers, it will still crash if it falls straight down. And 
if it loses altitude progressively, rather than plummeting to earth, it will still 
crash (or at least land) if the derivative never changes. 

Theorem 2.7.11 (Kantorovitch’s theorem). Let ao be a point in IR Tl , U 
an open neighborhood of ao in IR n and f : U — » K n a differentiable mapping , 
with its derivative [Df(ao)] invertible. Define 

ho = — [Df(ao)] _i f(ao) , ai=ao + ho , t/ 0 = |x| |x - aj| < |h 0 | j . 

2.7.46 

If the derivative [Df(x)j satisfies the Lipschitz condition 

|[Df(m)] - [Df(u 2 )]| < A/|ui - u 2 | for all points ui,u 2 € t/o, 2.7.47 
and if the inequality 

|f(ao)||[Df(ao)]-fM<± 2.7.48 

is satisfied, the equation f(x) = 6 has a unique solution in Uq, and Newton’s 
method with initial guess ao converges to it. 

If Inequality 2.7.48 is satisfied, then at each iteration we create a new ball 
inside the previous ball, and with at most half the radius of the previous: is 

in U 0 , U 2 is in t/i, . . . , as shown to the right of Figure 2.7.3. Iji particular, the 
Lipschitz condition that is valid for Uq is valid for all subsequent balls. As the 
radius of the balls goes to zero, the sequence ao,aj, . . . converges to a, which 
we will see is a root. 


FIGURE 2.7.3. Equation 2.7.46 defines the neighborhood Uq for which Newton’s 
method is guaranteed to work when the inequality of Equation 2.7.48 is satisfied. 
Left, the neighborhood Uq is the ball of radius |ho| — |aj — ao| around aj , so &u is on 
the border of Uq. Right: a blow-up of Uq. showing the neighborhood U\. 
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Equation 2.7.48: 
|f(ao)|||Df(ao)r‘| a M<i 

A good way to check whether 
some equation makes sense is to 
make sure that both sides have 
the same units. In physics this is 
essential. 


We “happened to notice” that 
sin 2 — 1 = —.0907 with the help 
of a calculator. Finding an initial 
condition for Newton’s method is 
always the delicate part. 

The Kantorovitch theorem 
does not say that the system of 
equations has a unique solution; it 
may have many. But it has one 
unique solution in the neighbor- 
hood U 0i and if you start with the 
initial guess ao, Newton’s method 
will find it for you. 


Recall that the inverse of 



The right units. Note that the right-hand side of Equation 2.7.48 is the 
unitless number 1/2: 


|?(ao)||[Df , (ao)]- , |Vs| 2.7.49 

All the units of the left-hand side cancel. This is fortunate, because there is 
no reason to think that the domain U will have the same units as the range; 
although both spaces have the same dimension, they can be very different. For 
example, the units of U might be temperature and the units of the range might 
be volume, with f measuring volume as a function of temperature. (In this 
one-dimensional case, f would be /.) 

Let us see that the units on the left-hand side of Equation 2.7.48 cancel. We’ll 
denotejby u the units of the domain, U, and by r the units of the range, M”. The 
term jf(ao)| has units r. A derivative has units range/domain (typically, dis- 
tance divided by time), so the inverse of the derivative has units domain/range 
= u/r, and the term |[Df (ao)]"* 1 1 2 has units u 2 /r 2 . The Lipschitz ratio M is the 
distance between derivatives divided by a distance in the domain, so its units 
are r/u divided by u. This gives the following units: 


r x 




both the r’s and the u’s cancel out. 


2.7.50 


Example 2.7.12 (Using Newton’s method). Suppose we want to solve 
the two equations 


cos(x -y) -y 
sin(x 4- y) — x, 


f 1 ') = 

cos(x j/) -y' 


'o' 

\y) 

sin(x 4 y) - x 


0 


2.7.51 


We just happen to notice that the equation is close to being satisfied at ( J 


cos(l - 1) - 1 = 0 and sin(l 4 1) - 1 = -.0907. . . . 2.7.52 

Let us check that starting Newton’s method at ao = ( J ) works. To do this 

we must see that the inequality of Equation 2.7.48 is satisfied. We just saw that 
\F(ao)\ ~ .0907 < .1. The derivative at ao isn’t much worse: 


0 -1 ' 

so fDFfan '}}" 1 - 1 

cos 2 1 

cos 2 — 1 cos 2 

cos 2 - 1 

1 - cos 2 0 


and 2.7.53 

jlD#(ao)] _1 j = (~ os2 1 _ 1 ) 2 ((cos2) 2 + 1 + (1 - cos2) 2 ) ~ 1.1727 < 2, 
as you will see if you put it in your calculator. 
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Rather than compute the Lipschitz ratio for the derivative using higher par- 
tial derivatives, we will do it directly, taking advantage of the helpful formulas 
| sin a - sin 6| < \a - 5|, | cos a - cos 6| < \a - 6|. 21 These make the computation 
manageable: 


(DF( )] - (DF( *1 )] = 


< 


-sin(xi - j/i ) + sin(x2 - Jfc) sin(xi - y\) - sin(x2 ~ Jfc) 
cos(xi + yi) - cos(x2 - j/2 ) cos(xi 4* yi) - cos(x 2 - 2/2) 

I - (*i - yi) + ( x 2 - y2)| |(^i - yi) - (*2 - V2)\ 

|(«i + yi) - (X2 - ya)l l(*i +V\) ~ (*2 -ifeJIJ 


= ^((*1 - *2) 2 + (»i - l*) a ) = 2 1 (y| ) - (» )| • 2 ' 7 ' 54 


Thus M = 2 is a Lipschitz constant for [DF]. Putting these together, we see 
that 

|#(«o)l |[DF(ao)) -1 | 2 At < .1 • 2 • 2 = .4 < .5 2.7.55 


The Kantorovitch theorem 
does not say that if Inequality 
2.7.48 is not satisfied, the equation 
has no solutions; it does not even 
say that if the inequality is not sat- 
isfied, there are no solutions in the 
neighborhood Uo. In Section 2.8 
we will see that if we use a different 
way to measure [Df(ao)j, which is 
harder to compute, then inequal- 
ity 2.7.48 is easier to satisfy. That 
version of Kantorovitch’s theorem 
thus guarantees convergence for 
some equations about which this 
somewhat weaker version of the 
theorem is silent. 

The MATLAB program “New- 
ton. m” is found in Appendix B.l. 


so the equation has a solution, and Newton’s method starting at ^ ^ will 
converge to it. Moreover, 


1 

cos 2 1 

m 

0 


[ sin 2-1 1 
1— cos 2 


-.064' 

cos 2 - 1 

1 - cos 2 0 

[sin 2 — 1 


0 


0 


2.7.56 


so Kantorovitch’s theorem guarantees that the solution is within .064 of 

The computer says that the solution is actually ^ ggg ^ , correct to three decimal 
places. A 


Example 2.7.13 (Newton’s method) using a computer). Now we will 
use the Matlab program to solve the equations 

x 2 - y + sin(x - y) — 2 and y 2 - x = 3, 2.7.57 

starting at ^ ^ ) and at ( ”2 ) * 

The equation we are solving is 


\ _ 

x 2 - y 4- sin(x - y) - 2 


p 

0 


y 2 - x - 3 


0 


2.7.58 


21 


By the mean value theorem, there exists a c between a and b such that 


j sin a - sin 6| = | cos c||a - 6|; 


aln' c 


since | cos c| < 1, we have | sin a - sin 6| < \u - 6|. 
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In fact, it superconverges: the 
number of correct decimals rough- 
ly doubles at each iteration; we 
see 1, then 3, then 8, then 14 cor- 
rect decimals. We will discuss su- 
perconvergence in detail in Section 
2 . 8 . 


Starting at ^ | . the Matlab Newton program gives the following values: 


2.10131373055664 \ 


x 0 = ( 2 ) . x > - ( 2.21 ) • x 2 - ( 2.25868946388913 ) 

„ _ ( 2.10125829441818 \ _ _ ( 

Xa ~ i, 2.25859653392414 )’ x < ~ \ 


2.10125829294805^ 
2.25859653168689 ) 


2.7.59 


and the first 14 decimals don’t change after that. Newton’s method certainly 
does appear to converge. 

But are the conditions of Kantorovitch’s theorem satisfied? The Matlab 

program prints out a “condition number,” cond, at each iteration, which is 
_ 2 

|F(x*)| [DF(x*)| 1 . Kantorovitch’s Theorem says that Newton’s method 

will converge if c ond * M < 1/2, where M is a Lipschitz constant for [DP] on 

Vi. 

We first computed this Lipschitz constant without higher partial derivatives, 
and found it quite tricky. It’s considerably easier with higher partial derivatives: 


D\D\f\ = 2 - sin(x - y)\ D x D 2 f\ = sin(x - y)\ D 2 D 2 fi = - sin(x - y) 
D a D\f 2 = —1; D\D 2 f 2 = 0; D 2 D 2 f 2 = 2, 2.7.60 


so 


5 Z( DiD if k ) 2 = ( 2 “ sin (* ~ V)) 2 + 2(sin(x - y)) 2 -1- (sin(x - y)) 2 + 4; 

1 .3 


E ( A / 5>/*) 2 


<9 + 2+1 + 4 = 16; 


2.7.61 


M = 4 is a Lipschitz constant for F on sill of K 2 . 

Let us see what we get when cond • M < 1/2. At the first iteration, c ond 
= 0.1419753 (the exact value is y/46/18), and 4 x 0.1419753 > .5. So Kan- 
torovitch’s theorem does not assert convergence, but it isn’t far off. At the next 
iteration, we find c ond = 0.00874714275069, and this works with a lot to spare. 

What happens if we start at ( ” 2 )^ The computer gives 


xo 

x 3 

X5 


« ( ~ 2 ) x, = ( “^I^ 54 ^ 3070248 ^ v 0 -f - 1 .8222 1 637692367 \ 
V 2 / ’ Xl V 1.30361391732438 ) ’ X2 “ { 1.10354485721642 ) 


_ 1.82152790765992 \ 

“ V 1.08572086062422 ) 

_ /- 1.82151878872556 \ 
- v 1.08557874485200 ) 


X4 


_ / -1.82151878937233 \ 
~ V 1.08557875385529 ) 


1 


and again the numbers do not change if we iterate the process further. It 
certainly converges fast. The condition numbers are 


0.3337, 0.1036, 0.01045, .... 


2.7.62 
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The computation we had made for the Lipschitz constant of the derivative is 
still valid, so we see that the condition of Kantorovitch’s theorem fails rather 
badly at the first step (and indeed, the first step is rather large), but succeeds 
(just barely) at the second. A 


point in 
domain 



increment 
in domain 


-[Df(ao)]-‘ f(ao) 


vector 
in range 

> v ' 

point minus vector equals point 


Remark. Although both the domain and the range of Newton’s method are 
n-dimensional, you should think of them as different spaces. As we mentioned, 
in many practical applications they have different units. It is further a good 
idea to think of the domain as made up of points, and the range as made up of 
vectors. Thus f(a*) is a vector, and h* = — [Df(a^)]“ 1 f(ai) is an increment in 
the domain, i.e., a vector. The next point a,+i = a* + h» is really a point: the 
sum of a point and an increment. 

Remark. You may not find Newton’s method entirely satisfactory; what if you 
don’t know an initial “seed” ao? Newton’s method is guaranteed to work only 
when you know something to start out If you don’t, you have to guess and hope 
for the best. Actually, this isn’t quite true. In the nineteenth century, Cayley 
showed that for any quadratic equation, Newton’s method essentially always 
works. But quadratic equations form the only case where Newton’s method 
does not exhibit chaotic behavior. 22 


2.8 Superconvergence 


Kantorovitch’s theorem is in some sense optimal: you cannot do better than 
the given inequalities unless you strengthen the hypotheses. 


Example 2.8.1 (Slow convergence). Consider solving f(x) = (x - l) 2 = 0 
by Newton’s method, starting at a 0 = 0. Exercise 2.8.1 asks you to show that 
the best Lipschitz ratio for /' is 2, so the product 

2 

l/(«o)l|(/'(a«))-*| 2 M = l.(-i) -2 = 1 2.8.1 

and Theorem 2.7.11 guarantees that Newton’s method will work, and will con- 
verge to the unique root a = 1 . The exercise further asks you to check that 
= l/2 n+1 so o n = 1 — l/2 n+1 , exactly the rate of convergence advertised. A 

Example 2.8.1 is both true and squarely misleading. If at each step Newton’s 
method only halved the distance between guess and root, a number of simpler 
algorithms (bisection, for example) would work just as well. 


For a precise description of how Newton’s method works for quadratic equations, 
and for a description of how things can go wrong in other cases, see J. Hubbard and 
B. West, Differential Equations, A Dynamical Systems Approach, Part /, Texts in 
Applied Mathematics No. 5, Springer- Verlag, N.Y., 1991, pp. 227-235. 
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Newton’s method is the favorite scheme for solving equations because usually 
it converges much, much faster than in Example 2.8.1. If, instead of allowing 
the product in Inequality 2.7.48 to be < 1/2, we insist that it be strictly less 

than 1/2: 

|f(ao)|||Df(ao)] _ 1 l 2 W = * < 2 - 8 - 2 


As a rule of thumb, if Newton’s 
method hasn’t converged to a root 
in 20 steps, you’ve chosen a poor 
initial condition. 


then Newton’s method superconverges. 

How soon Newton’s method starts superconverging depends on the problem 
at hand. But once it starts, it is so fast that within four more steps you will 
have computed your answer to as many digits as a computer can handle. In 
practice, when Newton’s method works at all, it starts superconverging soon. 

What do we mean when we say that a sequence ao,ai, . . . superconverges? 
Our definition is the following: 


The 1/2 in Xo = 1/2 is unre- 
lated to the 1/2 of Equation 2.8.2. 
If we were to define superconver- 
gence using digits in base 10, then 
the same sequence would super- 
converge starting at x$ < 1/10. 
For it to start superconverging at 
Xo, we would have to have Xo < 
1/10. 


Definition 2.8.2 (Superconvergence). Set x» = |a»+i - Oil; i.e., x» repre- 
sents the difference between two successive entries of the sequence. We will 
say that the sequence ao,a lt . . . superconverges if, when the x { are written 
in base 2, then each number x< starts with 2? - 1 « 2* zeroes. 

For example, the sequence x„+i = x£, starting with x 0 = 1/2 (written 
.1 in base 2), superconverges to zero, as shown m the left-hand side of Figure 
2.8.1. By comparison, the right-hand side of Figure 2.8.1 shows the convergence 
achieved in Example 2.8.1, again starting with xq = 1/2. 


xo = .1 

xi = .01 

X 2 = .0001 

x 3 = .00000001 

x 4 = .0000000000000001. 


x 0 = .1 
x\ — .01 
X2 = .001 
x 3 = .0001 
x 4 = .00001. 


Figure 2.8.1. Left: superconvergence. Right: the convergence guaranteed by Kan- 
torovitch’s theorem. In both cases, numbers are written in base 2: .1 = 1/2, .01 = 
1/4, .001 = 1/8,.... 


We will see that what goes wrong for Example 2.8.1 is that at the root a = 1, 
/'(a) = 0, so the derivative of / is not invertible at the limit point: 1//'(1) 
does not exist. Whenever the derivative is invertible at the limit point, we do 
have superconvergence. This occurs as soon as Equation 2.8.2 is satisfied: as 
soon as the product in the Kantorovitch inequality is strictly less than 1/2. 
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Even if k is almost 1 /2, so that 
c is large, the factor (1/2) 2 will 
soon predominate. 


Theorem 2.8.3 (Newton’s method superconverges). Let the condi- 
tions of the Kantorovitch theorem 2.7.11 be satisBed, but with the stronger 

assumption that 

|?(ao)l |(Df(a 0 )]‘ 1 | 2 M = k < (2.8.2) 

Set c=i=A|[Df(aor 1 |y. 2.8.3 

2*n 

If |h„| < i then . 2.8.4 

Equation 2.8.4 means superconvergence. Since h n = |an+i - a„|, starting at 
step n and using Newton’s method for m iterations causes the distance between 
a„ and an +m to shrink to practically nothing before our eyes. For example, if 
m = 10: 


1 / 1\ 1024 

|h n +m| < “ * * ^.8. 

The proof requires the following lemma, proved in Appendix A.3. 

Lemma 2.8.4. If the conditions of Theorem 2.8.3 are satisBed, then for all i, 


5 


|fit+i| < c\hi\ 2 . 


2.8.6 


Proof of Theorem 2.8.3. Let Xi = c|£j|. Then 

x t+ i = c|h,‘+i| < c*\hi\ 2 = x 2 . 

Our assumption that |hn| < ^ tells us that x n < 1/2. So 

x„ + , < xl < \ = Q) , 

X n + 2 — ( x n+l) 2 < Xn < Jg = , 

Xn+m < if < Q j 

Since |h„| < we have the result we want, Equation 2.8.4: 

if |hn| < then |h n+m | < ^ . □ 


2.8.7 


2.8.8 


2.8.9 



214 Chapter 2. Solving Equations 

Kantorovitch’s theorem: a stronger version (optional) 

We have seen that Newton’s method converges much faster than guaranteed 
by Kantorovitch’s theorem. In this subsection we show that it is possible to 
state Kantorovitch’s theorem in such a way that it will apply to a larger class 
of functions. We do this by using a different way to measure linear mappings: 
the norm ||>1|| of a matrix A. 

Definition 2.8.5 (The norm of a matrix). The norm \\A\\ of a matrix A 
is 

Mil = sup|j4£|, when |£| = 1 . 2.8.10 


Multiplication by the matrix A 
of Example 2.8.6 can at most dou- 
ble the length of a vector; it does 
not always do so; the product .4b, 

where A = ^ ^ and b — ® , 

» » 

» . 

is j , with length 1. 

• « 


In Example 2.8.6, note that Mil = 2, while |>i| = \/5. It is always true that 

Mil < \A \ ; 2.8.12 

this follows from Proposition 1.4.11, as you are asked to show in Exercise 2.8.2. 

This is why using the norm \\A\\ rather than the length \A\ makes Kan- 
torovitch’s theorem stronger: the theorem applies equally as well when we use 
the norm rather than the length to measure the derivative (Df(x)] and its in- 
verse, and the key inequality of that theorem, Equation 2.7.48, is easier to 
satisfy using the norm. 

Theorem 2.8.7 (Kantorovitch’s Theorem: a stronger version). Kan- 
torovitch } 8 theorem 2.7.11 still bolds if you replace all lengths of matrices by 
norms of matrices. 

Proof. In the proof of Theorem 2.7.11 we only used the triangle inequality 
and Proposition 1.4.11, and these hold for the norm ||>i|| of a matrix A as well 
as for its length |j 4|, as Exercises 2.8.3 and 2.8.4 ask you to show. □ 


There are many equations for 
which convergence is guaranteed if 
one uses the norm, but not if one 
uses the length. 


This means that ||>t|| is the maximum amount by which multiplication by A 
will stretch a vector. 


Example 2.8.6 (Norm of a matrix). Take 


2 Ol , \x 
= jj andx= 


so that Ax = 


Since by definition fx| = y/x 2 + y 2 = 1, we have 

M|| = sup |i4x| = sup \/ Ax 2 4- y 2 = 2 . A 

|x|=i ' 

setting x=l,y=0 


2.8.11 
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Figure 2.8.2. 

The diagram of the trigonomet 
circle for Example 2.8.8. 


Unfortunately, the norm is usually muck harder to compute than the length. 
In Equation 2.8.11 above, it is not difficult to see th at 2 is th e largest value of 
^/4x 2 + y 2 compatible with the requirement that \Jx 2 + y 2 = 1, obtained by 
setting i = l and y - 0. Computing the norm is not often that easy. 


Example 2.8.8 (Norm is harder to compute). The length of the matrix 
is v/1 2 + l 2 + l 2 = \/3> or about 1.732. The norm is or 


A = 


1 1 
0 1 


about 1.618; arriving at that figure takes some work, as follows. A vector 


x 

y 


with length 1 can be written 


cost 
sin t 


, and the product of A and that vector is 


cos t + sin t 
sin t 


, so the object is to find 


sup \J (cos t + sin t) 2 + sin 2 t . 2.8.13 

At its maximum and minimum, the derivative of a function is 0, so we need 
to see where the derivative of (cost + sint) 2 + sin 2 t vanishes. That derivative 
is sin2t + 2cos2t, which vanishes for 2t = arctan(— 2). We have two possible 
angles to look for, t\ and t 2 , as shown in Figure 2.8.2; they can be computed 
with a calculator or with a bit of trigonometry, and we can choose the one that 
gives the biggest value for Equation 2.8.13. Since the entries of the matrix A 
are all positive, we choose tj, in the first quadrant, as being the best bet. 

By similar triangles, we find that 


1 2 
cos 2ti = — and sin 2ti = —=. 

v/5 %H 

Using the formula cos 2t, = 2 cos 2 1, — 1 = 1 — 2 sin 2 1, we find that 


COS 


t,= vK^)’ and sinti = ysHO- 


which, after some computation, gives 


costi 
sin 


. -h sinti 2 _ 
inti ” 


3+ >/5 


and finally ||A|| = || ^ j] || = ^ 


l + v'S 


2.8.14 


2.8.15 


2.8.16 

2.8.17 


Remark. We could have used the following formula for computing the norm 
of a 2 x 2 matrix from its length and its determinant: 



|AP + v '|A| 4 -4(detA) 2 


2.8.18 



216 Chapter 2. Solving Equations 


In higher dimensions things are much worse. It was to avoid this kind of com- 
plication that wc used the length rather than the norm when we proved Kan- 
torovitch’s theorem in Section 2.7. A 

In some cases, however, the norm is easier to use than the length, as in the 
following example. In particular, norms of multiples of the identity matrix are 
easy to compute: such a norm is just the absolute value of the multiple. 


“Mat" of course stands for 
“matrix”; Mat (2, 2) is the space 
of 2 x 2 matrices. 


You might think that some- 


Example 2.8.9 (Using the norm in Newton’s method). Suppose we 

8 1 


thing like 


3 1 

-1 3 


would be even 


better, but squaring that gives 

8 6l . , . 

. in addition, starting 

with a diagonal matrix makes our 
computations easier. 


You may recognize the AB + 
BA in Equation 2.8.22 from Equa- 
tion 1.7.45, Example 1.7.15. 


want to find a 2 x 2 matrix A such that A 2 = 
Mat (2, 2) — Mat (2. 2) by 

F(A) = A 2 - 


-1 10 


. So we define F : 


8 1 

-1 10 


2.8.19 


and try to solve it by Newton’s method. First we choose an initial point Aq. A 
logical place to start would seem to be the matrix 


Aq = 


3 0 
0 3 


so that Aq ~ 


9 0 
0 9 


2.8.20 


We want to see whether the Kantorovitch inequality 2.7.48 is satisfied, i.e., that 


- 1||2 


1 

5 2- 


2.8.21 


2 . 8.22 


|F(A 0 )|-M||[DF(A 0 )] 

First, compute the derivative: 

(DF(A)]£ = AB + BA. 

The following computation shows that A >-► [DF(A)J is Lipschitz with re- 
spect to the norm, with Lipschitz ratio 2 on all of Mat (2,2): 

ll[DF(A,)] - [DF(A 2 )]H = | SupJ((DF(>t 1 )] - (DF(A,)])b| 

= sup jAjB -f BAj - A 2 B - BA 2 J = sup |(Ai - A 2 )B + B(A i - A 2 )\ 
l B l = l |Bf=l 

< sup | (Ai - A 2 )B\ + | F(A, - A 2 )\ < sup | A, - A 2 \\B\ + \B\\A 1 - A 2 1 

|B| = 1 

2.8.23 


< sup 2|£||Ai - A 2 1 = 2|Ai - A 2 

IBI=1 

Now we insert A 0 into Equation 2.8.19, getting 



'9 0‘ 


00 

l 


r 

1 -1 


0 9 
» ■ 


-I 10 

II 

i -I 


2.8.24 


so that |F(A 0 )| = y/Z = 2. 
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Now we need to compute ||[DF(Ao)j ^j 2 - Using Equation 2.8.22 and the 
fact that A 0 is three times the identity, we get 

[DF(A 0 ))B = A 0 B + BA 0 = 3B + W = 6 B. 2.8.25 


TVying to solve Equation 2.8.19 
without Newton’s method would 
be unpleasant. In a draft of this 
book we proposed a different ex- 
ample, finding a 2 x 2 matrix A 
such that 


A 2 + A = 


1 1 
1 1 


A friend pointed out that this 
problem can be solved explicitly 
(and more easily) without New- 
ton’s method, as Exercise 2.8.5 
asks you to do. 


So we have 


[DF(A 0 )\-'B = 


l*| 


||[DF(i4 0 )] l \\ = sup \B/6\ = sup — = 1/6, 


| B |=1 \b\=i 6 


1 


2.8.26 


||[DFMo)]- 1 ir = 3g. 

The left-hand side of Equation 2.8.21 is 2 • 2 • 1/36 = 1/9, and we see that 

„ [3 <f 

the inequality is satisfied with room to spare: if we start at 


0 3 


and use 


Newton’s method, we can compute the square root of 


8 1 

-1 10 


2.9 The Inverse and Implicit Function 
Theorems 

In Section 2.2 we completely analyzed systems of linear equations. Given 
a system of nonlinear equations, what solutions do we have? What variables 
depend on others? Our tools for answering these questions are the implicit 
function theorem and its special case, the inverse function theorem. These two 
theorems are the backbone of differential calculus, just as their linear analogs, 
Theorem 2.2.4 and its special case, Theorem 2.2.5, are the backbone of linear 
algebra. We will start with inverse functions, and then move to the more general 
case. 


The inverse and implicit func- 
tion theorems are a lot harder 
than the corresponding linear the- 
orems, but most of the hard work 
is contained in the proof of Kan- 
torovitch’s theorem concerning 
the convergence of Newton’s me- 
thod. 


Inverse functions in one dimension 

An inverse function is a function that “undoes” the original function. If f(x) — 
2x, clearly there is a function g(f(x)) = x, mainly, g{y) = yj 2. Usually finding 
an inverse isn’t so straightforward. But the basic condition for a continuous 
function in one variable to have an inverse is simple: the function must be 
monotone. 


“Implicit" means “implied." 
The statement 2x - 8 = 0 implies 
that x = 4; it does not say it ex- 
plicitly (directly). 


Definition 2.9.1 (Monotone function). A function is monotone if its 
graph always goes up or always goes down: if x < y always implies f(x) < 
f(yh the function is monotone increasing; if x < y always implies /(x) > 
f{y), the function is monotone decreasing. 
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The function f(x) = 2x 4- sin x 
is monotone increasing; it has an 
inverse function g(2x + sin z) = x , 
but finding it requires solving the 


If a function / that expresses x in terms of y is monotone, then its inverse 
function g exists expressing y in terms of x. In addition you can find y(y) by a 
series of guesses that converge to the solution, and knowing the derivative of / 
tells you how to compute the derivative of g. 

More precisely: 

Theorem 2.9.2 (Inverse function theorem in one dimension). Let 
f : [a, 6] — ► [c, d\ be a continuous function with /(a) = c , f(b) = d and with 
f increasing (or decreasing) on [a, 6] . Then: 

(a) There exists a unique continuous function g : [c, dj — ► [a, b] such that 

f ( 9(y )) -y, for all ye [c, dj, and 2.9. 1 

g(f(x)) = x, for all x € [a, 6]. 2.9.2 

(b) You can find g(y) by solving the equation y — f(x) = 0 for x by bisection 
(described below). 

(c) If f is differentiable at x € (a, 6), and f’(x) £ 0, then g is differentiable 
at f(x), and its derivative satisfies ^(/(x)) = l/f'(x) . 


equation 2x + sinx = y, with x the 
unknown and y known. This can 
be done, but it requires an approx- 
imation technique; you can’t find 
a formula for the solution using al- 
gebra, trigonometry or even more 
advanced techniques. 

Part (c) justifies the use of im- 
plicit differentiation: such state- 
ments as 

. ,, . 1 

arcs in (x) = — 


You are asked to prove Theorem 2.9.2 in Exercise 2.9.1. 

Example 2.9.3 (An inverse function in one dimension). Take f(x) = 
2a; + sin x, shown in Figure 2.9.1, and choose [a, b] = [— kir, kir] for some positive 
integer k . Then 

/(a) = /(-kir) - -2kir + sin(-A;7r) and f(b) = f(kn) = 2kn + sin(A; 7 r); 2.9.3 

= o 

i.e., /(a) = 2a and f(b) = 26, and since f'(x) = 2 + cosx, which is > 1, we 
see that / is strictly increasing. Thus Theorem 2.9.2 says that y = 2a; + sin a: 
expresses x implicitly as a function of y for y e [-2kn, 2kx): there is a function 
g : [ 2A;7r, 2kn] [-kir, kir] such that g(f(x)) = g(2x + sin a;) = x. 

But if you take a hardnosed attitude and say, “Okay, so what is g( 1)?”, you 
will see that this question is not so easy to answer. The equation 1 = 2a; + sin a;, 
is not a particularly hard equation to “solve,” but you can’t find a formula for 
the solution using algebra, trigonometry or even more advanced techniques. 
Instead you must apply some approximation technique. A 


In several variables the approximation technique we will use is Newton’s 
method; in one dimension, we can use bisection. Suppose you want to solve 
/(*) = V » y° u k »ow a and 6 such that /(a) < y and /(6) > y. First try the 
x in the middle of [a, 6), computing If the answer is too small, try the 

midpoint of the right half-interval; if the answer is too big, try the midpoint of 
the left half-interval. Next choose the midpoint of the quarter-interval to the 
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B C 



Figure 2.9.2. 

The function graphed above is 
not monotone and has no global 
inverse; the same value of y gives 
both x — B and x = C. Similarly, 
the same value of y gives x — A 
and x = D. But it has many local 
inverses; the arc AB and the arc 
CD both represent x as a function 
of y. 


The inverse function theorem is 
really the “local” inverse function 
theorem, carefully specifying the 
domain and the image of the in- 
verse function. 


As in Section 1.7, we are using 
the derivative to linearize a non- 
linear problem. 

Fortunately, proving this con- 
structivist version of the inverse 
function theorem is no harder than 
proving the standard version 


right (if your answer was too small) or to the left (if your answer was too big). 
The sequence of x n chosen this way will converge to g(y). 

Note, as shown in Figure 2.9.2, that if a function is not monotone, we cannot 
expect to find a global inverse function, but there will usually be monotone 
stretches of the function for which local inverse functions exist. 


Inverse functions in higher dimensions 

In one dimension, monotonicity of a function is a sufficient (and necessary) 
criterion for an inverse function to exist, and bisection can be used to solve the 
equations. The point of the inverse function theorem is to show that inverse 
functions exist in higher dimensions, even though monotonicity and bisection 
do not generalize. In higher dimensions, we can’t speak of a mapping always 
increasing or always decreasing. The requirement of monotonicity is replaced 
by the requirement that the derivative of the mapping be invertible. Bisection 
is replaced by Newton’s method. The theorem is a great deal harder in higher 
dimensions, and you should not expect to breeze through it. 

The inverse function theorem deals with the case where we have as many 
equations as unknowns: f maps U to W, where U and W are both subsets of 
R n . By definition, f is invertible if the equation f(x) = y has a unique solution 
x 6 U for every y € W. 

But generally we must be satisfied with asking, if f(xo) = yo, in what neigh- 
borhood of yo does there exist a local inverse? The name “inverse function 
theorem” is somewhat misleading. We said in Definition 1.3.3 that a transfor- 
mation has an inverse if it is both onto and one to one. Such an inverse is 
global. Very often a mapping will not have a global inverse but it will have a 
local inverse (or several local inverses): there will be a neighborhood V c W of 
yo and a mapping g : V — ► U such that (f o g)(y) = y for all y € V. 

The statement of Theorem 2.9.4 is involved. The key message to retain is: 

If the derivative is invertible, the mapping is locally invertible. 

More precisely : 

If the derivative of a mapping f is invertible at some point Xo, the mapping 
is locally invertible in some neighborhood of the point f(xo) 

All the rest is spelling out just what we mean by “locally” and “neighbor- 
hood.” The standard statement of the inverse function theorem doesn’t spell 
that out; it guarantees the existence of an inverse, in the abstract: the theorem 
is shorter, but also less useful. If you ever want to use Newton’s method to 
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We saw an example of local 
vs. global inverses in Figure 2.9.2; 
another example is f(x) = x 2 . 
First, any inverse function of / can 
only be defined on the image of 
/, the positive real numbers. Sec- 
ond, there are two such “inverses,” 
9\{y) = +v/y and g 2 (y) = -y/y y 
and they both satisfy /(<?(y)) = 
y, but they do not satisfy 

g(m) = x. 

However, <?i is an inverse if the 
domain of / is restricted to x > 0, 
and ya is an inverse if the domain 
of / is restricted to x < 0. 


The statement “suppose that 
the derivative L — [Df(xo)] is 
invertible” is the key condition of 
the theorem. 


We could write f (g(y)J = y as 
the composition 

(f°g)(y) =y- 

On first reading, skip the last 
sentence concerning the little ball 
with radius Ri, centered at xo. 

It is a minor point, and we will 
discuss it later. Do notice that we 
have two main balls, Wo centered 
at xo and V centered at yo = 
f(xo), as shown in Figure 2.9.3. 


The ball V gives a lower bound 
for the domain of g; the actual 
domain may be bigger. 


compute an inverse function, you’ll need to know in what neighborhood such a 
function exists. 23 


Theorem 2.9.4 (The inverse function theorem). Let W C R m be an 
open neighborhood ofxo, and f : W -> ! m be a continuously differentiable 
function. Set yo = f(xo), and suppose that the derivative L = [Df(xo)] is 
invertible. 

Let R > 0 be a number satisfying the following hypotheses: 


(1) The ball Wo of radius 2R\L~ l \ and centered at xo is contained in W. 

(2) In Wo, the derivative satisfies the Lipechitz condition 

Lipschits ratio 


|[Df(u)] - [Df(v)]| < 


|u - v| . 


2.9.4 


2 /*| L ~ 1 | 2 

There then exists a unique continuously differentiable mapping g Grom the 
ball of radius R centered at yo (which we will denote V) to the ball Wq: 


g : V -+ W Q , such that 


2.9.5 


f(*(y))=y Bad [Dg(y)] = [Df^y))]" 1 . 2.9.6 


Moreover , the image of g contains the ball of radius R\ around xq, where 


Ri = 2R\L~ l \ 7 



2.9.7 


The theorem tells us that if certain conditions are satisfied, then f has a 
local inverse function g. The function f maps every point in the lumpy-shaped 
region g(V) to a point in V, and the inverse function g will undo that mapping, 
sending every point in V to a point in g(V’). 

Note that not every point f(x) is in the domain of g; as shown in Figure 
2.9.3, f maps some points in W to points outside of V. For this reason we had 
to write f(g(y)) = y in Equation 2.9.6, rather than g(f(x)) = x. In addition, 
the function f may map more than one point to the same point in V, but only 
one can come from Wo (and any point from Wo must come from the subset 
g(K)). But g maps a point in V to only one point. (Indeed, if g mapped the 
same point in V to more than one point, then g would not be a well-defined 

But once your exams are over, you can safely forget the details of how to compute 
that neighborhood, as long as you remember (I) if the derivative is invertible, the 
mapping is locally invertible, and (2) that you can look up statements that spell out 
what “locally” means. 
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mapping, as discussed in Section 1.3.) Moreover, that point is in Wo- This may 
appear obvious from Figure 2.9.3; after all, g(V') is the image of g and we can 
see that g(V) is in Wo . But the picture is illustrating what we have to prove, 
not what is given; the punch line of the theorem is precisely that “ . . . then 
there exists a unique continuously differentiable mapping from ... V to the ball 
Wo-" 



f 



FIGURE 2.9.3. The function f : W — maps every point in g(Y) to a point in 
V; in particular, it sends x© to yo. Its inverse function g : V — Wo sends every point 
in V to a point in g(V). Note that f can well map other points outside Wo into V. 


Do you still remember the main point of the theorem? Where is the mapping 

'(;)-CaO 

guaranteed by the inverse function theorem to be locally invertible? 24 

24 You’re not being asked to spell out how big a neighborhood “locally’ refers to, so 
you can forget about ft, V, etc- Remember, if the derivative of a mapping is invertible, 
the mapping is locally invertible. The derivative is 

[-'(;)! - [£ 4l- 

The formula for the inverse of a 2 x 2 matrix 

-■[: 5 ] “-'--wM-' •;]■ 

and here ad — be = — 2(i 2 + y 2 ), which is 0 only if i = 0 and y = 0. The function 

is locally invertible near every point except f determine whether 

a larger matrix is invertible, use Theorem 2.3.2. Exercise 1.4.12 shows that a 3 x 3 
matrices is invertible if its determinant is not 0. 



222 Chapter 2. Solving Equations 



Figure 2.9.4. 


The graph of “best Lipschitz 
constant” Mr for [Df] on the ball 
of radius 2R\L~ l \ increases with 
R , and the function 

1 

2R\L~ l \ 2 

decreases. The inverse function 
theorem only guarantees an in- 
verse on a neighborhood V of ra- 
dius R when 

2R\L~ ] ‘\ 2 < Mr ‘ 


In emphasizing the “main point” we don’t mean to suggest that the details 
are unimportant. They are crucial if you want to compute an inverse function, 
since they provide an effective algorithm for computing the inverse: Newton’s 
method. This requires knowing a lower bound for the natural domain of the 
inverse: where it is defined. To come to terms with the details, it may help to 
imagine different quantities as being big or little, and see how that affects the 
statement. First, in an ideal situation, would we want R to be big or little? 
We’d like it to be big, because then V will be big (remember R is the radius 
of V) and that will mean that the inverse function g is defined in a bigger 
neighborhood. What might keep R from being big? First, look at condition (1) 
of the theorem. We need Wo to be in W, the domain of f. Since the radius of 
Wq is 2R\L~ l \ y if R is too big, may no longer fit in IV. 

That constraint is pretty clear. Condition (2) of the theorem is more delicate. 
Suppose that on W the derivative (Df(x)] is locally Lipschitz. It will then be 
Lipschitz on each Wo C W, but with a best Lipschitz constant Mr which starts 
out at some probably non-zero value when W 0 is just a point (i.e., when R = 0), 
and gets bigger and bigger as R increases (it’s harder to satisfy a Lipschitz 
ratio over a large area than a small one). On the other hand, the quantity 
\/{2R\L starts at infinity when R = 0, and decreases as R increases (see 
Figure 2.9.4). So Inequality 2.9.4 will be satisfied when R is small; but usually 
the graphs of M R and 1/(2R|L _1 | 2 ) will cross for some Ro, and the inverse 
function theorem does not guarantee the existence of an inverse in any V with 
radius larger than Rq. 

The conditions imposed on R may look complicated; do we need to worry 
that maybe no suitable R exists? The answer is no. If f is differentiable, and 
the derivative is Lipschitz (with any Lipschitz ratio) in some neighborhood of 
xo, then the function M R exists, so the hypotheses on R will be satisfied as 
soon a s R < Rq. Thus a differentiable map with Lipschitz derivative has a local 
inverse near any point where the derivative is invertible: if L -1 exists, we can 
find an R that works. 


Do we really have to check that the derivative of a function is Lipschitz? 
The main difficulty in apply- The answer is no: as we will see in Corollary 2.7.8, if the second partial 

ing those principles is that Mr derivatives off are continuous, then the derivative is automatically Lipschitz 

is usual y very hard to compute, in some neighborhood ofxo- Often this is enough, 
and \L although usually easier, 
may be unpleasant too. 

Remark. The standard statement of the inverse function theorem, which 
guarantees the existence of an inverse function in the abstract, doesn’t require 
the derivative to be Lipschitz, just continuous. 25 Because we want a lower 

Requiring that tbe derivative be continuous is necessary, as you can see by looking 
at Example 1.9.3, in which we described a function whose partial derivatives are not 
continuous at the origin; see Exercise 2.9.2 
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Newton’s method applied to the 
equation / y (x) = 0 starting at xo 
converges to a root in U 0 . Kan- 
torovitch’s theorem tells us this is 
the unique root in t/o; the inverse 
function theorem tells us that it is 
the unique root in ail of W 0 . 

We get the first equality in 
Equation 2.9. 10 by plugging in ap- 
propriate values to the definition 
of ho given in the statement of 
Kantorovitch’s theorem (Equation 
2.7.46): 

ho = -(Df(ao)r'f(ao). 

Recall that in Equation 2.9.10 
we write ho(y) rather than ho be- 
cause our problem depends on y: 
we are solving f y = 0. 


bound for R (to know how big V is), we must impose some condition about how 
the derivative is continuous. We chose the Lipschitz condition because we want 
to use Newton’s method to compute the inverse function, and Kantorovitch’s 
theorem requires the Lipschitz condition. 

Proof of the inverse function theorem 

We show below that if the conditions of the inverse function theorem are sat- 
isfied, then Kantorovitch’s theorem applies, and Newton’s method can be used 
to find the inverse function. 

Given y £ V, we want to find x such that f(x) = y. Since we wish to use 
Newton’s method, we will restate the problem: Define 

f y (x) = f f (x) - y = 0. 2.9.8 

We wish to solve the equation f y (x) = 0 for y e V, using Newton’s method 
with initial point x 0 . 

We will use the notation of Theorem 2.7.11, but since the problem depends 
on y, we will write ho(y), U 0 ( y), etc. Note that 

(Df y (xo)] = (Df(xo)) - a°d f y (x 0 ) = f(x 0 ) -y = y 0 - y, 2.9.9 

=yo 


so that 


ho(y) = - [Dfy(xo)] l f y (x 0 ) = -L~ l ( y 0 - y). 


2.9.10 


This implies that |ho(y)| < \L~ l \R, since y 0 is the center of K, y is in V, 
and the radius of V is R , giving |yo -y| <_/?. Now we compute Xj = xo-f-h 0 (y) 
(as in Equation 2.7.46, where aj = ao 4- ho). Since |ho(y)| is at most half the 
radius of W 0 (i.e., half 2R\L~ X \), we see that U 0 (y) (the ball of radius |h 0 (y)| 
centered at Xi) is contained in W 0 , as suggested by Figure 2.9.5. 

Now we see that the Kantorovitch inequality (Equation 2.7.48) is satisfied: 

|f y (xo)| |[Df(xo)]-p M < = l . 

lyo-y|<fl |L-‘| Z 


2.9.11 


M 


Thus Newton’s method applied to the equation f y (x) = 0 starting at x 0 

converges; denote the limit by g(y). Certainly on C, fog is the identity: as 
we have just shown, f(g(y)) = y. Q 


We now have our inverse function g. A complete proof requires showing that 
g is continuously differentiable. This is shown in Appendix A. 4. 

Example 2.9.5 (Where is f invertible?). Where is the function 
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-.6 .6 

— r 



Figure 2.9.6. 

Top: The square -.6 < x < 
.6, —2.2 < y < 1. Bottom: Its im- 
age under the mapping f of Ex- 
ample 2.9.5. Note that the square 
is folded over itself along the line 
x + y — -tt/ 2 (the line from B to 
D); f is not invertible in the neigh- 
borhood of the square. 


f 


fx\ _ ( sin(x + y) 
\ V ) l x 2 - y 2 


) 


2.9.12 


locally invertible? The derivative is 

[■*(;)] - [““L + ” “?*” 


2.9.13 


which is invertible if —2y cos(x + y) - 2xcos(x + y) ^ 0. (Remember the formula 
for the inverse of a 2 x 2 matrix. 26 ) So f is locally invertible at all points ^ ^ ) 
that satisfy -y ^ x and cos(x + y) 0 (i.e., i + 7t/2 -f Jen). A 


Remark. We strongly recommend using a computer to understand the map- 
ping f : R 2 — ► R 2 of Example 2.9.5and, more generally, any mapping from R 2 to 
R 2 . (One thing we can say without a computer’s help is that the first coordinate 
of every point in the image of f cannot be bigger than 1 or less than -1, since 
the sine function oscillates between -1 and 1. So if we graph the image using 
x,y coordinates, it will be contained in a band between x = — 1 and x = 1.) 
Figures 2.9.6 and 2.9.7 show just two examples of regions of the domain of f 
and the corresponding region of the image. Figure 2.9.6 shows a region of the 
image that is folded over; in that region the function has no inverse. A 



FIGURE 2.9.7. The function f of Example 2.9 5 maps the region at left to the region 
at right. In this region, f is invertible. 


Example 2.9.6. Let C\ be the circle of radius 3 centered at the origin in R 2 , 
and C 2 be the circle of radius 1 centered at ( What is the loci of centers 
of line segments drawn from a point of C\ to a point of C 2 ? 


a b 

1 d -b 

. c d. 

ad - be -c a 


26 
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The center of the segment joining 


It follows from the formula for 
the inverse of a 2 x 2 matrix that 
the matrix is not invertible if its 
determinant is 0; Exercise 1.4.12 
shows that the same is true of 
3x3 matrices. Theorem 4.8.6 
generalizes this to n x n matrices. 

Does Example 2.9.6 seem arti- 
ficial? It’s not. Problems like this 
come up all the time in robotics; 
the question of knowing where a 
robot arm can reach is a question 
just like this. 




(”K 10 ) eC ’ MU 


is the point 


( 0\ _ 1 / 3 cos 0 + cos + 10 \ 
\<p ) 2\ 3sin0 + siny> / 


2.9.15 


We want to find the image of F. A point where [df(^)] is invertible 
will certainly be in the interior of the image (since points in the neighborhood 
of that point are also in the image), so the candidates to be in the boundary of 
the image are those points F (£) where Df(£)] is not invertible. Since 


MM - 


l 

- det 
4 


-3sin0 -siny> 
3 cos 0 cos ^ 


2.9.16 


= — (sin 0 cos ^ - cos 0 sin ip) 


—7 sin(0 - ip), 
4 


which vanishes when 0 — ip and when 0 = <p + ir, we see that the candidates for 
the boundary of the image are the points 

F (e) = { 2 Tsti 5 ) and f (* + *) = ( C °s S in* 5 )’ 2 ' 917 

i.e., the circles of radius 2 and 1 centered at p = (q)‘ on ^ re g^ ons w hose 
boundaries are subsets of these sets are the whole disk of radius 2 and the 
annular region between the two circles. We claim that the image of F is the 
annular region, since the symmetric of C 2 with respect to p is the circle of 
radius 1 centered at the origin, which does not intersect Ci, so p is not in the 
image of F. A 


We say “guaranteed to exist" 
because the actual domain of the 
inverse function may be larger 
than the ball V. 


Example 2.9.7 (Quantifying “locally”). Now let’s return to the function 
f of Example 2.9.5; let’s choose a point xo where the derivative is invertible and 
see in how big a neighborhood of f(xo) an inverse function is guaranteed to exist. 
We know from Example 2.9.5 that the derivative is invertible at xo = (J)' 

This gives L = [Df ( 9 )] = [ ' "J , so 


L = 


i r — 27 t 


0 -1 


, and |L' 


-i|2 _ 


4tt 2 + 2 


2.9.18 
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Next we need to compute the Lipschitz ratio M (Equation 2.9.23). We have 


For the first inequality of Equa- 
tion 2.9. 19, remember that 

| cos a — cos6| < |a — 6|, 
and set a = ui 4-112 and b — vi +v 2 - 

In going from the first square 
root to the second, we use 

(a 4* 6) 2 < 2 (a 2 4- 6 2 ), 
setting a = ui-vi and 6 = U 2 -V 2 . 


Since the domain W of f is 
R 2 , the value of R in liquation 
2.9.20 clearly satisfies the require- 
ment that the ball Wo with radius 
2R\L~ l \ be in W. 


|(Df(u)] - (Df(v)H 

__ r COS(ti! 4- U2) - COS(Vi -i- V 2 ) 

“ 2(ui-vi) 


COS(«i + 1*2 ) - cos(vi 4- v 2 ) 
2(V2 - l>l) 


< 


Ui + U 2 - Vi - V2 

2(u x - Ui) 


Ui + U2 — Vi - V2 

2(v 2 - U 2 ) 


= \j2{(u\ - Vi) 4- (U2 - v 2 )) 2 4-4((ti! - Vi) 2 4- (u 2 - v 2 ) 2 ) 

= - Vi ) 2 4- (U2 - V2) 2 ) 4- 4((ui - Vi ) 2 4- (u 2 - v 2 ) 2 ) 

= v^8|u-v|. 2.9.19 

Our Lipschitz ratio M is thus %/8 = 2\/2, allowing us to compute R : 


1 


= 2v/2, so R = 


47 r 2 


« 0.16825. 


2.9.20 


2R\L~ 1 \ 2 ~ 4v/2(47r 2 4-2) 

The minimum domain V of our inverse function is a ball with radius « 0.17. 
What does this say about actually computing an inverse? For example, since 


f ( 2 ) = (_£*) . and ( _jq ) is within . 17 of ( » ) , 

then the inverse function theorem tells us that by using Newton’s method we 
can solve for x the equation f (x) = ( _9yJ ) . A 


The implicit function theorem 

We have seen that the inverse function theorem deals with the case where we 
have n equations in n unknowns. Forgetting the detail, it says that if U C R n 
is open, f : U — ♦ R n is differentiable, f(xo) = yo and [Df(xo)] is invertible, 
then there exists a neighborhood V of yo and an inverse function g : V — ► R n 

with g(yo) = xo, and f o g(y) = y. Near the equation f(x) = y (or 

equivalently, f(x) - y = 0) expresses x implicitly as a function of y. 

Stated this way, there is no reason why the dimensions of the variables x and 
y should be the same. 


Example 2.9.8 (Three variables, one equation). The equation x 2 4-y 2 4- 
z 2 - 1 = 0 expresses z as an implicit function of ( * ) near ^ 0^ . This implicit 


function can be made explicit: z = ^1 ~~ x 2 — y 2 ; you can solve for z as a 
function of x and y. A 



Recall that C l means continu- 
ously differentiable: differentiable 
with continuous derivative. We 
saw (Theorem 1.9.5) that this is 
equivalent to requiring that all 
the partial derivatives be contin- 
uous. As in the case of the in- 
verse function theorem, Theorem 
2.9.9 would not be true if we did 
not require [DF(x)] to be contin- 
uous with respect to x. Exercise 
2.9.2 shows what goes wrong in 
that case. But such functions are 
pathological; in practice you are 
unlikely to run into any. 

Theorem 2.9.9 is true as stated, 
but the proof we give requires that 
the derivative be Lipschitz. 
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More generally, if we have n equations in n + m variables, we can think of 
m variables as “known,” leaving n equations in the n “unknown” variables, 
and try to solve them. If a solution exists, then we will have expressed the n 
unknown variables in terms of the m known variables. In this case, the original 
equation expresses the n unknown variables implicitly in terms of the others. 

If all we want to know is that an implicit function exists on some unspecified 
neighborhood, then we can streamline the statement of the implicit function 
theorem; the important question to ask is, “is the derivative onto?” 

Theorem 2.9.9 (Stripped-down version of the implicit function the- 
orem). Let U be an open subset ofR n+m . Let F : U — * R n be a C 1 mapping 
such that F(c) = 0, an d such that its derivative , the linear transformation 
[DF(c)], is onto. Then the system of linear equations [DF(c)](x) = 0 has 
n pivotal variables and m non- pivotal variables , and there exists a neighbor- 
hood of c for which F = 0 implicitly defines the n pivotal variables in terms 
of the m non-pivotal variables. 

The implicit function theorem thus says that locally, the mapping behaves 
like its derivative— i.e., like its linearization. Since F goes from a subset of 
K” +m to R n , its derivative goes from R n+m to R n . The derivative (DF(c)] 
being onto means that it spans R n . Therefore [DF(c)| has n pivotal columns 
and m non-pivotal columns. We are then in the case (2b) of Theorem 2.2.4; 
we can choose freely the values of the m non-pivotal variables; those values will 
determine the values of the n pivotal variables. The theorem says that locally, 
what is true of the derivative of F is true of F. 

The full statement of the implicit function theorem 

In Sections 3.1 and 3.2, we will see that the stripped-down version of the implicit 
function theorem is enough to tell us when an equation defines a smooth curve, 
surface or higher dimensional analog. But in these days of computations, we 
often need to compute implicit functions; for those, having a precise bound on 
the domain is essential. For this we need the full statement. 

Note that in the long version of the theorem, we replace the condition that 
the derivative be continuous by a more demanding condition, requiring that the 
derivative be Lipschitz. Both conditions are ways of ensuring that the derivative 
not change too quickly. In exchange for the more demanding hypothesis, we 
get an explicit domain for the implicit function. 

The theorem is long and involved, so we’ll give some commentary. 
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The assumption that we are 
trying to express the first n vari- 
ables in terms of the last m is a 
convenience; in practice the ques- 
tion of what to express in terms of 
what will depend on the context. 


We represent by a the first n 
coordinates of c and by b the last 
m coordinates. For example, if 
n = 2 and m = 1, the point c € R 3 


might be 



with a = 



€ 


5£ 2 , and b = 2 € IX. 


If it isn’t clear why L is invert- 
ible, see Exercise 2.3.6. 

The 0 stands for the m x n zero 
matrix; / m is the m x m identity 
matrix. So L is (n -f m) x (n -f m). 
If it weren’t square, it would not 
be invertible. 


Equation 2.9.25, which tells us 
how to compute the derivative of 
an implicit function, is important; 
we will use it often. What would 
we do with an implicit function if 
we didn’t know how to differentiate 
it? 


First line, through the line immediately following Equation 2.9.21: Not only is 
[DF(c)j is onto, but also the first n columns of [DF(c)j are pivotal. (Since 
F goes from a subset of R n+m to HI”, so does [DF(c)]. Since the matrix of 
Equation 2.9.21, formed by the first n columns of that matrix, is invertible, the 
first n columns of [DF(c)j are linearly independent, i.e., pivotal, and [DF(c)J 
is onto.) 

The next sentence: We need the matrix L to be invertible because we will use 
its inverse in the Lipschitz condition. 

Definition of W 0 : Here we get precise about neighborhoods. 

Equation 2.9.23: This Lipschitz condition replaces the requirement in the 
stripped-down version that the derivative be continuous. 

Equation 2.9.24 : Here we define the implicit function g. 


Theorem 2.9.10 (The implicit function theorem). Let W be an open 
neighborhood of c — ^ € R n+m , and F : W — *■ R n be differentiable , with 
F(c) = 0. Suppose that the nxn matrix 


lD l F(c),...,D n F(c)}, 2.9.21 

representing the first n columns of the derivative of F, is invertible. 

Then the following matrix, which we denote L, is invertible also 


L = 


IDi F(c), . . . , D n F(c)) [D n+l F(c), . . . , Z> m F(c)] 


1 m 


] 


2.9.22 


Let W 0 = B 2 r\l-\\(c) c R n+m be the ball of radius 2R\L~ 1 \ centered at 
c. Suppose that R > 0 satisfies the following hypotheses: 

(1) It is small enough so that Wo C W. 

(2) In Wo, the derivative satisfies the Lipschitz condition 

|[DF(u)] - [DF(v)]| < ^ 2 |u - v|. 2.9.23 


Then there exists a unique continuously differentiable mapping 

g:£*(b)— >£ 2 fliL-*|(a) such that F( g ^))=0 for all y € B R (b), 

2.9.24 

and the derivative of the implicit function g at b is 

PsW) = - piF(c), . . . , AtFtc)) - 1 ^ |F(c), . . ■ , Dn+rnFjc )J . 2.9.25 

partial deriv. for partial deriv. for 

pivotal variables non-pivot al variables 
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Since the range of F is R n , 
saying that [DF(c)j is onto is the 
same as saying that it has rank 
n. Many authors state the implicit 
function theorem in terms of the 
rank. 

The inverse function theorem 
is the special case of the implicit 
function theorem where we have 
2n variables: the unknown n- 
dimensional variable x and the 
known n-dimensional variable y, 
and where our original equation is 
f(x) - y = 0; it is the case where 
we can separate out the y from 

'00 

There is a sneaky way of mak- 
ing the implicit function theorem 
be a special case of the inverse 
function theorem; we use this in 
our proof. 


Equation 2.9.28: In the lower 
right-hand corner of L we have the 
number 1, not the identity matrix 
I ; our function F goes from IR 2 to 
R, so n = m — 1, and the lxl 
identity matrix is the number 1. 


Summary. We assume that we have an n-dimensional (unknown) variable 
x, an m-dimensiorial (known) variable y, an equation F : R n+m -*• R Tl , and a 

point (g) such that F (g) =0. We ask whether the equation F ) = 0 

expresses x implicitly in terms of y near The implicit function theorem 

asserts that this is true if the linearized equation 



2.9.26 


expresses u implicitly in terms of v, which we know is true if the first n columns 
Ms)] are linearly independent. A 
The theorem is proved in Appendix A. 5. 


Example 2.9.11 (The unit circle and the implicit function theorem). 
The unit circle is the set of points c = ^ ^ such that F(c) = 0 when F is the 

function F = x 2 +y 2 — 1 . The function is differentiable, with derivative 


DF(l) = \2a,2b}. 2.9.27 

In this case, the matrix of Equation 2.9.21 is the 1 x 1 matrix [2a], so requiring 
it to be invertible simply means requiring a ^ 0 . 

Therefore, if a ^ 0, the stripped-down version guarantees that in some neigh- 
borhood of ( ^ » the equation x 2 + y 2 - 1=0 implicitly expresses x as a func- 
tion of y. (Similarly, if 6 ^ 0, then in some neighborhood °f ( 5 ) the equation 

x 2 4- y 2 - 1 = 0 expresses implicitly y as a function of x.) 

Let’s see what the strong version of the implicit function theorem says about 
the domain of this implicit function. 

The matrix L of Equation 2.9.22 is 



2 a 

2b 

, and L 1 = — 

1 

-2b 


0 

1 

’ 2 a 

0 

2 a 


So we have 

The derivative of F is Lipschitz with Lipschitz ratio 2: 

l[ D/ (£)] - [ DF (£)]| = lf 2u > - 2 “* - **]l 

= 2\{u x -Vi,u 2 - v 2 }\ < 2[u 


- v[, 


2.9.28 


2.9.29 


2.9.30 


so (by Equation 2.9.23) we can satisfy condition (2) by choosing an R such that 
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Equation 2.9.31: Note the way 
the radius R of the interval around 
b shrinks, without ever disappear- 
ing, as a — * 0. At the point 

( ± 1 ) ’ e 9 uat * on 

i 2 + y 2 -l=0 

does not express x in terms of y , 
but it does express x in terms of y 
when a is arbitrarily close to 0. 


Of course there are two possible 
x’s. One will be found by starting 
Newton’s method at a, the other 
by starting at -a. 

In Equation 2.9.33 we write 
1/DiF rather than (DiF) -1 be- 
cause D\ F is a a x 1 matrix. 




— 2/?| J L- 1 | 2 ’ 4|L -1 | 2 5 

We then see that Wo is the ball of radius 

2832 

since W is all of R 2 , condition (1) is satisfied. 

Therefore, for all ^ J ^ when a ^ 0, the equation x 2 + y 2 - 1 = 0 expresses 

x (in the interval of radius |a|/>/5 around a) as a function of y (in the interval 
of radius a 2 / 5 around b). 

Of course we don’t need the implicit function theor em to u nderstand the unit 
circle; we already knew that we could write x = ±y/l — y 2 . But let’s pretend 
we don’t, and go further. The implicit function theorem says that if we know 

that a point ^ ^ is a root of the equation x 2 -f y 2 — 1 =0, then for any y within 

a 2 / 5 of 6, we can find the corresponding x by starting with the guess xo = a 
and applying Newton’s method, iterating 

F ( X y) rt + v 2 - 1 

^n+l — x n — x n r • A 2.9.33 

D ' F iv) " 

Example 2.9.12 (An implicit function in several variables). In what 
neighborhood of ^ q ^ do the equations 


2.9.31 


2.9.32 


3?n+l — %n 


= X n - 


K + r - 1 


2.9.33 


x-y =a 
y 2 ~z =6 
z 2 -x =0 


2.9.34 


determine as an implicit function g (?)> with g ^ j = 

n — 3, m — 2; the relevant function is F : R 5 — ♦ R 3 , given by 

/ x\ 


0 I? Here, 

0 / 


y 

F x 


x - y - a \ 
y 2 - z-b ; 


2.9.35 


the derivative of F is Df I z = 


X 2 

“* ) 

f 


2x 

-1 

0 

-1 

0 

2 y 

-1 

0 

-1 

0 

2 z 

0 


-1 , 2.9.36 

0 
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and M = 2 is a global Lipschitz constant for this derivative: 


2x i 
0 

-1 


-1 

2yi 

0 


0 

-1 

2z\ 


-1 0 

0 -1 

0 0 


2X2 

-1 

0 

-1 

0 

0 

2y 2 

-1 

0 

-1 

-1 

0 

2z 2 

0 

0 


l/x,\ 

/ Xt \\ 




= 2\J (xi -x 2 ) 2 + (j/i-J/ 2 ) 2 + (z\-z 2 ) 2 < 2 


y\ 

Z\ 
ai 

\bj 


V2 
* 2 
d 2 

\b2 / 


2.9.37 


Setting x = y = z = 0 and adding the appropriate two bottom lines, we find 
that 


" 

0 

-1 

o' 


'-1 

o' 



• 0 

0 

-1 

0 

0- 


0 

0 

-1 


0 

-1 



-1 

0 

0 

-1 

0 


-1 

0 

0 


0 

0 


, L~ l - 

0 

-1 

0 

0 

-1 


0 

0 

o : 


1 

o : 



0 

0 

0 

1 

0 


0 

0 

0 


0 

1 

J 

m 


. 0 

0 

0 

0 

1 . 


2.9.38 


Since the function F is defined on all of IR 5 , the first restriction on R is 
vacuous. The second restriction requires that 


1 


2R\L~'\ 2 


> 2, i.e., R < 


28 


2.9.39 



of this discussion is 


the y of Equation 2.9.24, and the 
origin here is the b of that equa- 
tion. 


Thus we can be sure that for any ^ j in the ball of radius 1/28 around the 

origin (i.e., satisfying \/a 2 + $ < 1/28), there will be a unique solution to 
Equation 2.9.34 with 



„ 2V7 1 

- 28 — 2%/7 


2.9.40 


2.10 Exercises for Chapter Two 


Exercises for Section 2.1: 2.1.1 (a) Write the following system of linear equations as the multiplication 

Row Reduction of a matrix by a vector, using the format of Exercise 1.2.2. 

3x + y - 4z — 0 
2y + z = 4 
x ~ 3y = 1. 

(b) Write the same system as a single matrix, using the shorthand notation 
discussed in Section 2.1. 



232 Chapter 2. Solving Equations 

(c) Write the following system of equations as a single matrix: 

Xi - 7X2 -I- 2x 3 = 1 

X\ - 3x2 ~ 2 

2xi - 2x2 = — 1- 

2 . 1.2 Write each of the following systems of equations as a single matrix: 

3y - z = 0 2xi + 3x 2 - x 3 = 1 

(a) -2x + y + 2z = 0 ; (b) -2x 2 + x 3 = 2 

x - 5z = 0 X\ — 2x 3 = -1. 

2 . 1.3 Show that the row operation that consists of exchanging two rows is 
not necessary; one can exchange rows using the other two row operations: (1) 
multiplying a row by a nonzero number, and (2) adding a multiple of a row 
onto another row. 

2 . 1.4 Show that any row operation can be undone by another row operation. 
Note the importance of the word “nonzero” in the algorithm for row reduction. 

2 . 1.5 For each of the four matrices in Example 2.1.7, find (and label) row 
operations that will bring them to echelon form. 

2 . 1.6 Show that if A is square, and A is what you get after row reducing A 
to echelon form, then either A is the identity, or the last row is a row of zeroes. 

2.1.7 Bring the following matrices to echelon form, using row operations. 



‘1 3-14] [1 11 f 

(d) 1 2 12 (e) 2 -3 3 3 

.37 1 9j [1-422 

2,1.8 For Example 2.1.10, analyze precisely where the troublesome errors 
occur. 


In Exercise 2. 1.9 we use the fol- 
lowing rules: a single addition, 
multiplication, or division has unit 
cost; administration (i.e., relabel- 
ing entries when switching rows, 
and comparisons) is free. 


2.1.9 In this exercise, we will estimate how expensive it is to solve a system 
Ax — b of n equations in n unknowns, assuming that there is a unique solution, 
i.e., that A row reduces to the identity. In particular, we will see that partial 
row reduction and back substitution (to be defined below) is roughly a third 
cheaper than full row reduction. 

In the first part., we will show that the number of operations required to row 
reduce the augmented matrix [i4|b] is 


R(n) = n 3 -f n 2 / 2 - n/2. 
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Hint: There will be n - < 
1 divisions, (n — l)(n — k ■+ 
multiplications and (n - l)(n - 
1) additions. 


1 * * ... * 6 i 

0 1 * ... * 62 

000... 1 6„ 


Exercises for Section 2.2: 

Solving Equations 
with Row Reduction 


(a) Compute R(1),R(2), and show that this formula is correct when n = 1 
and 2. 

t + (b) Suppose that columns k - 1 each contain a pivotal 1, and that 

■ l) all other entries in those columns are 0. Show that you will require another 

(2n - l)(n - k + 1) operations for the same to be true of k. 

(c) Show that 

n - 1 )(n - fc + 1) = n 3 + y - y 

k-l 

Now we will consider an alternative approach, in which we will do all the steps 
of row reduction, except that we do not make the entries above pivotal l’s be 
0. We end up with a matrix of the form at left, where * stands for terms which 
are whatever they are, usually nonzero. Putting the variables back in, when 
n — 3, our system of equations might be 

x + 2y - z = 2 

y - 3z = -1 

z — 5, which can be solved by back substitution as follows: 


2 = 5, y = -1 + 3z = 14, x = 2 — 2y + 2 = 2 — 28 + 5 = -21. 

We will show that partial row reduction and back substitution takes 

2 3 1 

Q( n ) = ;n 3 + -n 2 - -n — 1 operations. 

o A O 


(d) Compute (?(1),Q(2),(?(3). Show that Q(n) < R{n) when n > 3. 

(e) Following the same steps as in part (b), show that the number of op- 
erations needed to go from the (k - l)th step to the fcth step of partial row 
reduction is (n - k + l)(2n -2k + 1). 

(f) Show that 

n 

£(n - * + 1)(2 n - 2k + 1) = -n 3 + in 2 - -n. 
k=i 326 

(g) Show that the number of operations required by back substitution is 
n 2 - 1. 

(h) Compute Q(n). 


2.2.1 Rewrite the system of equations in Example 2.2.3 so that y is the first 
variable, 2 the second. Now what are the pivotal unknowns? 

2.2.2 Predict whether each of the following systems of equations will have 
a unique solution, no solution, or infinitely many solutions. Solve, using row 



234 Chapter 2. Solving Equations 

operations. If your results do not confirm your predictions, can you suggest an 
explanation for the discrepancy? 

(a) 2x + 13 y - 3z = —7 (b) x - 2y - 12 z =12 (c) x + y + z = 5 

x + y = 1 2x + 2y + 2z = 4 x — y — z = 4 

x + 7z= 22 2x + 3y -I- 4z = 3 2x + 6y + 6z — 12 

(d) (e) x + 2y + z — 4w + v = 0 

x + 3y + 2= 4 x + 2y-2 + 2tc;-~i; = 0 

—x ~ y + z = —1 2x + 4y + z — 5w + v = 0 

2x + 4j/ = 0 x + 2y + 3^ - lOu; + 2v — 0 

2.2.3 Confirm the solution for 2.2.2 (e), without using row reduction. 

2.2.4 Compose a system of ( n — 1) equations in n unknowns, in which b 
contains a pivotal 1. 

2.2.5 On how many parameters does the family of solutions for Exercise 
2.2.2 (e) depend? 

2.2.6 Symbolically row reduce the system of linear equations 

x + y + 2z = 1 
x — y + az = b 
2x ~bz — 0. 

(a) For what values of o, b does the system have a unique solution? Infinitely 
many solutions? No solutions? 

(b) Which of the possibilities above correspond to open subsets of the (a, b)- 
plane? Closed subsets? Neither? 

For example, for k = 2 we are 

asking about the system of equar 2.2.7 (a) Row reduce the matrix 

' 7 


tions 


1 -1 

-2 2 






1 -1 3 

-2 2 -6 
0 2 5 

2 -6 -4 


(b) Let v*, k = 1, ... 5 be the columns of A. What can you say about the 
systems of equations 


[vi, • • • y v*] 


= v*+i 


for k = 1,2, 3, 4. 
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2.2.8 Given the system of equations 

X i — X2 — X 3 — 3X4 + X 5 = 1 

Xi + X 2 - 5x 3 - x 4 + 7x 5 = 2 
-Xi + 2 X 2 + 2x 3 + 2 X 4 + X 5 = 0 
- 2 xi + 5 x 2 - 4x 3 + 9x 4 + 7x 5 = 0, 

for what values of 0 does the system have solutions? When solutions exist, give 
values of the pivotal variables in terms of the non-pivotal variables. 


Exercises for Section 2.3: 

Inverses and 
Elementary Matrices 


A = 


2 

1 

1 


1 3 a 

-1 1 b 

1 2 c 


2.3.1 (a) Derive from Theorem 2.2.4 the fact that only square matrices can 
have inverses, (b) Construct an example where AB = /, but BA £ /. 

2.3.2 (a) Row reduce symbolically the matrix A at left. 

(b) Compute the inverse of the matrix B at left. 

(c) What is the relation between the answers in parts (a) and (b)? 

2.3.3 Use A~ x to solve the system of Example 2.2.10. 


B = 


‘2 

1 


1 

-1 

1 


3 

1 

2 


Matrices for Exercise 2.3.2 


C = 


1 

0 

3 


-2 4 

5 -5 

a b 


Matrix for Exercise 2.3.5 


2.3.4 Find the inverse, or show it does not exist, for each of the following 
matrices: 


(a) 


(e) 


1 -5 
9 9 


i (b) 

1 3 
3 9 

; W 

*1 2 3' 

2 3 0 

; (d) 

*1 2' 
0 3 




0 1 2 


1 0 


0 1 
8 3 


2 -1 
1 
9 


; (0 


1 0 
2 1 
1 1 


; (g) 


1111 

12 3 4 

1 3 6 10 

LI 4 10 20 


2.3.5 (a) For what values of a and 6 is the matrix C at left invertible? 

(b) For those values, compute the inverse. 


2.3.6 (a) Show that if A is an invertible nxn matrix, B is an invertible m x m 

matrix, and C is any n x m matrix, then the (n + m) x (n + m) matrix 


[0 0 0' 


A C 

L0 0 0. 


0 B 
> « 


where 0 stands for the m x n 0 matrix, is invertible. 


A 


u 0 matrix.” 

(b) 

-6 3' 

-7 3 
-12 5 

2.3.7 

(b) 


Matrix for Exercise 2.3.7 


Find a formula for the inverse. 

For the matrix A at left, (a) Compute the matrix product AA. 
Use the result in (a) to solve the system of equations 

x — 6y + Zz — 5 
2x - 7y + 3z = 7 
4x - 12 y + 5 z = 11. 
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In both cases, remember that 
the elementary matrix goes on the 
left of the matrix to be multiplied. 


2.3.8 (a) Confirm that multiplying a matrix by a type 2 elementary matrix as 
described in Definition 2.3.5 is equivalent to adding rows or multiples of rows. 

(b) Confirm that multiplying a matrix by a type 3 elementary matrix is 
equivalent to switching rows. 

fl 0 -f 

2.3.9 (a) Predict the effect of multiplying the matrix 2 1 1 by each 

[0 i 2 

of the elementary matrices, with the elementary matrix on the left. 


'l 0 0] ("10 0] [l 0 O' 

(1) 0 3 0 (2) 0 0 1 (3) 0 1 0 

0 0 lj [0 1 Oj L 2 0 1 


(b) Confirm your answer by carrying out the multiplication. 

(c) Redo part (a) and part (b) placing the elementary matrix on the right. 

2.3.10 When A is the matrix at left, multiplication by what elementary ma- 
trix corresponds to: 

(a) Exchanging the first and second rows of >1? 

(b) Multiplying the fourth row of A by 3? 

(c) Adding 2 times the third row of A to the first row of A? 

2.3.11 (a) Predict the effect of multiplying the matrix B at left by each of 
the matrices. (The matrices below will be on the left.) 

'1 0 -3] ("1 0 0] ("10 0* 

(1) 0 1 0 (2) 0 2 0 (3) 0 0 1 . 

.0 0 lj [° 0 l \ [o 1 0 

(b) Verify your prediction by carrying out the multiplication. 

2.3.12 Show that column operations (Definition 2.1.11) can be achieved by 
multiplication on the right by an elementary matrix of type 1,2 and 3 respec- 
tively. 

2.3.13 Prove Proposition 2.3.7. 

2.3.14 Show that it is possible to switch rows using multiplication by only 
the first two types of elementary matrices, as described in Definition 2.3.5. 

2.3.15 Row reduce the matrices in Exercise 2.1.7, using elementary matrices. 

Exercises for Section 2.4: 2.4.1 Show that Sp (vj, . . . , v k ) is a subspace of and is the smallest 

Linear Independence subspace containing Vj , . . . , v fc . 

2.4.2 Show that the following two statements are equivalent to saying that a 
set of vectors vj, . . . , v* is linearly independent: 



‘1 2 0 f 
113 3 
0 10 1 
2 113 

The matrix A of Exercise 2.3.10 
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(a) The only way to write the zero vector 0 as a linear combination of the 
v, is to use only zero coefficients. 

(b) None of the vj is a linear combination of the others. 

2.4.3 Show* that the standard basis vectors ej , . . . . e* are linearly indepen- 
dent. 


2.4.4 (a) For vectors in K 2 , prove that the length squared of a vector is the 

sum of the squares of its coordinates, with respect to any orthonormal basis: 
i.e., that if v\ — v„ and wi , . . . w„ are two orthonormal bases, and 

aiVi -l b a n v„ = 61W1 H b 6„w n , then a 2 + • •• + a 2 = b\ 4- • • • + 6 2 . 


(b) Prove the same thing for vectors in M 3 . 

(c) Repeat for R n . 


2.4.5 Consider the following vectors: 

’1' 

1 

i 

V 

2 

, and 

V 

1 


0 


1 


a 

m m 


(a) For what values of a are these three vectors linearly dependent? 

(b) Show that for each such a the three vectors lie in the same plane, and 
give an equation of the plane. 


Recall that Mat (n, m) denotes 
the set of n x m matrices. 


2.4.6 (a) Let be vectors in IR n . What does it mean to say that 

they are linearly independent? That they span M"? That they form a basis of 

m n ? 


1 2 I 

(b) Let A = 2 1 . Are the elements I, A, A 2 , A 3 linearly independent in 

Mat (2, 2)? What is the dimension of the subspace V C Mat (2, 2) that they 
span? 


(c) Show that the set W of matrices B 6 Mat (2, 2) that satisfy AB = BA 
is a subspace of Mat (2, 2). What is its dimension? 

(d) Show that V C W. Are they equal? 


2.4.7 Finish the proof that the three conditions in Definition 2.4.13 are equiv- 
alent: show that (2) implies (3) and (3) implies (1). 


2.4.8 Let V\ 


= rr 

~ [i 


and v 2 = 


Let x and y be the coordinates with 


respect to the standard basis {©i,€ 5 2 } and let u and v be the coordinates with 
respect to {vi,v 2 }. Write the equations to translate from (x,y) to (u,v) and 


back. Use these equations to write the vector 


3 
-5 


in terms of vi and v 2 . 
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Hint for Exercise 2.4.10, part 
(b): Work by induction on the 
number m of columns. First check 
that it is true if m = 1. Next , sup- 
pose it is true for m - 1 , and view 
an n x m matrix as an augmented 
matrix, designed to solve n equa- 
tions in m - 1 unknowns. 

After row reduction there is a 
pivotal 1 in the last column ex- 
actly if a m is not in the span of 
ai , . . . , a m -i , and otherwise the 
entries of the last column satisfy 
Equation 2.4.10. 

(When figures and equations 
are numbered in the exercises, 
they are given the number of the 
exercise to which they pertain.) 


Exercise 2.4.12 says that any 
linearly independent set can be ex- 
tended to form a basis. In FYench 
treatments of linear algebra, this 
is called the theorem of the incom- 
plete basis ; it plus induction can 
be used to prove all the theorems 
of linear algebra in Chapter 2. 


2.4.9 Let vi, . . . , v n be vectors in IR m , and let P^rlR" — * K T,t be given by 



(a) Show that vi,... , v n are linearly independent if and only if the map 
is one to one. 

(b) Show that vi, . . . , v n span R m if and only if P{ v } is onto. 

(c) Show that vj, . . . , v n is a basis of M m if and only if P{ v } is one to one 
and onto. 


2.4.10 The object of this exercise is to show that a matrix A has a unique 
row echelon form A: i.e., that all sequences of row operations that turn A into 
a matrix in row echelon form produce the same matrix, A. This is the harder 
part of Theorem 2.1.8. 

We will do this by saying explicitly what this matrix is^ Let Abe annxm 
matrix with columns ai , . . . , a m . Make the matrix A = [5] .... , a m ] as follows: 

Let *i < • • • < I* be the indices of the columns that are not linear combina- 
tions of the earlier columns; we will refer to these as the unmarked columns. 

■ * 1 —w 

Set a tj = ej\ this defines the marked columns of A. 

If a i is a linear combination of the earlier columns, let j(l) be the largest 
unmarked index such that j(l) < /, and write 


Ql 


m 

a, = > setting a* = 

This defines the unmarked columns of A. 

(a) Show that .4 is in row echelon form. 

(b) Show that if you row reduce A , you get A. 


Q m 

o 

o 


2.4.10 


2.4.11 Let vj , . . . , v* be vectors in 2£ n , and set V = [vi , . . . , v*]. 

(a) Show that the set V! , . . . , v* is orthogonal if and only if V T V is diagonal. 

(b) Show that the set is orthonormal if and only if V T V = I k . 


2.4.12 (a) Let V be a finite-dimensional vector space, and vi,...v* € V 

linearly independent vectors. Show that there exist v* +1 ,...,v n such that 
vi , . . . , v„ G V is a basis of V. 

(b) Let V be a finite- dimensional vector space, and vi,...v* € V be a 
set of vectors that spans V. Show that there exists a subset of 

{1.2,..., A:} such that v M , . . . , Vj nt is a basis of V. 
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Exercises for Section 2.5: 
Kernels and Images 


(a) 


( 6 ) 


1 1 3 

2 2 6 


1 2 
-1 1 
-1 4 


3 

1 

5 


(c) 


1 

1 

2 


1 1 
2 3 


3 4 

Matrices for Exercise 2.5.2. 


A = 


1 6 
a 2 


1 2 a 

B — a b a 

b b a 

Matrices for Exercise 2.5.6. 


2.5.1 Prove that if T : R n — ► R m is a linear transformation, then the kernel 
of T is a subspace of R n , and the image of T is a subspace of ! m . 

2.5.2 For each of the matrices at left, find a basis for the kernel and a basis 
for the image, using Theorems 2.5.5 and 2.5.7. 

2.5.3 TVue or false? (Justify your answer). Let / : R m — ► R fc and g : R n — ► 
R m be linear transformations. Then 

f o g = 0 implies Imgp = ker /. 

2.5.4 Let P 2 be the space of polynomials of degree < 2, identified with R 3 by 

M 

identifying a + bx + cx 2 to [ b I . 

(a) Write the matrix of the linear transformation T : P 2 P 2 given by 

(T(p))(x) = xp'(x) + x 2 p"{x). 

(b) Find a basis for the image and the kernel of T. 

2.5.5 (a) Let P* be the space of polynomials of degree < k. Suppose T : P* — ► 
R fc+1 is a linear transformation. What relation is there between the dimension 
of the image of T and the dimension of the kernel of T? 

rp(o) 


(b) Consider the mapping Tit : Pk ~ * R fc+1 given by Tk(j>) = 


P(l) 


lp(k) 


. What 


is the matrix of T 2 , where P 2 is identified to R 3 by identifying a + bx + cx 2 to 
a 

6 1? (c) What is the kernel of Tjt? 



'l 

1 

3 

6 

2 


A = 

2 

-1 

0 

4 

1 

f n 


4 

1 

6 

16 

5 








Jo 

B = 

'2 

2 

I 

-1 

3 

0 

6 

4 

2' 

I 

2.5.6 Make a 


• 





the matrices at le 


Matrices for Exercise 2.5.7 


k 

i = 0 


p(t)dt = ^ c ip(i) for all polynomials p € Pk- 


sketch the dimensions of the images. 


2.5.7 For the matrices A and B at left, find a basis for the image and the 
kernel, and verify that the dimension formula is true. 
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2 . 5.8 Let P be the space of polynomials of degree at most 2 in the two 
variables x, y< which we will identify to M 6 by identifying 

*ar 

a\ + a 2 x + a 3 y + a^x 2 + a$xy + a$y 2 with j 

- 06 - 

(a) What are the matrices of the linear transformations S, T : P — ► P given 
by 

S<P)(S)-**.P(5) and T( P ){l)=yD 2P (* y )-> 

(b) What are the kernel and the image of of the linear transformation 

p^2p- S(p) - T(p)? 

2 . 5.9 Let Oi , • • • , a*, 6j , . . . , 6jfc be any 2k numbers. Show that there exists a 
unique polynomial p of degree at most 2k — 1 such 

p(i) = a,, p'(i ) = bi 

for all integers i with 1 <i<k. In other words, show that the values of p and 
p' at 1, . . . , k determine p. Hint: you should use the fact that a polynomial p of 
degree d such that p(i) = p'(i) = 0 can be written p(x) = (x — i) 2 q(x) for some 
polynomial q of degree d - 2. 

2 . 5.10 Decompose the following into partial fractions, as requested, being 
explicit in each case about the system of linear equations involved and showing 
that its matrix is invertible: 

(a) Write 

x + x 2 A B C 

(x + l)(x + 2)(x + 3) x+l x + 2 x + 3‘ 

(b) Write 

x + x 3 ^ Ax + B Cx 2 + Dx + F 

(x + l) 2 (x - 1)3 “ (x + 1)2 + (x - 1)3 

2 . 5.11 (a) For what value of a can you not write 

x ~ j; Ap B\X + Bq 0 

(x + l)(x 2 + ax + 5) x+1 x 2 + ax + 5 ’ 

(b) Why does this not contradict Proposition 2.5.15? 

2 . 5.12 (a) Let f(x) = x+i4x 2 + £x 3 . Find a polynomial g(x) = x+ax 2 +/?x 2 
such that g{f(x)) - x is a polynomial starting with terms of degree 4. 

(b) Show that if 
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* 

f(x) = x + Yl aixl is a P ol y nomia1 ' then Hiere exlsts a unique polynomial 

t=2 

k 

g(x) = x 4- ^2 h * T ' with 9 ° f( x ) ~ x + xk+l P ( x ) for 501116 polynomial p. 

7-2 


The polynomial p which Ex- 
ercise 2.5.16 constructs is called 
the Lagrange interpolation poly- 
nomial, which “interpolates” be- 
tween the assigned values. 

Hint for Exercise 2.5.16: Con- 
sider the map from the space of 
P n of polynomials of degree n to 
R n+1 given by 


p(x 0 ) 

Lp(Xn) 


You need to show that this map 
is onto; by Corollary 2.5.11 it is 
enough to show that its kernel is 
{ 0 }. 


2.5.13 A square nxn matrix P such that P 2 = P is called a projector. 

(a) Show that P is a projector if and only if / - P is a projector. Show that 
if P is invertible, then P is the identity. 

(b) Let V\ = ImgP and — kerP. Show that any vector v € 5fc w can 
be written uniquely v = Vi + v 2 with Vi € V\ and V 2 € Vfe. Hint: v ■= 
P(v) + (v - P(v). 

(c) Show that there exists a basis Vj , . . . , v„ of 3£ n and a number k < n such 
that P(vi) = vi....,P(v fc ) = v A .,P(v fc+1 ) = 0, . . . , P(v n ) = 0. 

(d) Show that, if Pi and P 2 are projectors such that P 1 P 2 = 0, then Q — 
P\ + P 2 ~ (P 2 P 1 ) is a projector, kerQ = kerPj n ker P 2 , and the image of Q is 
the space spanned by the image of Pi and the image of P 2 . 

2.5.14 Show that if A and B are n xn matrices, and AB is invertible, then 
A and B are invertible. 

*2.5.15 Let Ti,T 2 : R” — ► be linear transformations. 

(a) Show that there exists S : — * ift” such that Ti = 5 o T 2 if and only if 

kerT 2 C kerTi. 

(b) Show that there exists 5 : R n — ► 1R” such that T\ = T 2 o 5 if and only if 
ImgTi C Imgr 2 . 

*2.5.16 (a) Find a polynomial p(x) — a + bx -F cx 2 of degree 2 such that. 

p(0) — 1, p(l) — 4, and p(3) = -2. 

(b) Show that if if x 0 , • • • , x n are n 4- 1 distinct points in K. and a 0 , . . . , a n are 
any numbers, there exists a unique polynomial of degree n such that p(x,) — a x 
for each i = 0 , . . . ,n. 

(c) Let the x, and at be as above, and let 60 , . . . , b n be some further set of 
numbers. Find a number k such that there exists a unique polynomial of degree 
k with 


p(x,) = a i and p'(xi) = bi for alii = 0 , ... ,n. 

*2.5.17 This exercise gives a proof of BezouVs Theorem. Let pi and p 2 be 
polynomials of degree ki and k 2 respectively, and consider the mapping 

T‘ (qi,q2) ~*piqi + />2tf2, 
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Exercises for Section 2.6: 
Abstract Vector Spaces 


Note: To show that a space is 
not a vector space, you will need 
to show that it is not (0}. 


where qi and q 2 are polynomials of degrees k 2 - 1 and k x - 1 respectively, so 

that piq\ + P 2 Q 2 of degree< kx + k 2 - \ . 

Note that the space of such (qi , q 2 ) is of dimension kx+k 2 , and the space of 
polynomials of degree k\ \- k 2 — 1 is also of dimension k.i + k 2 . 

(a) Show that ker T = {0} if and only if p\ and p 2 are relatively prime (have 
no common factors). 

(b) Use Corollary 2.5.11 to show that if p u P 2 are relatively prime, then 
there exist unique qi and q 2 as above such that 

P 1 Q 1 + P 2 Q 2 = 1. (Bezout’s Theorem) 

2.6.1 Show that the space C(0, 1) of continuous real-valued functions f(x) 
defined for 0 < x < 1 (Example 2.6.2) satisfies all eight requirements for a 
vector space. 

2.6.2 Show that the transformation T : C 2 (R) — ► C(IR) given by the formula 

{T(f))(x) = (x 2 + 1 )f"(x) - xf'(x) + 2/(x) 
of Example 2.6.7 is a linear transformation. 

2.6.3 Show that in a vector space of dimension n, more than n vectors are 
never linearly independent, and fewer than n vectors never span. 

2.6.4 Denote by £(Mat (n, n), Mat (n, n)) the space of linear transformations 
from Mat(n, n) to Mat-(n,n). 

(a) Show that £(Mat (n, n), Mat (n, n)) is a vector space, and that it is finite 
dimensional. What is its dimension? 

(b) Prove that for any A € Mat (n,n), the transformations 

La, R a : Mat (n, n) — » Mat (n, n) given by 

L a {B) — AB, R a (B) = BA 

are linear transformations. 

(c) What is the dimension of the subspace of transformations of the form 

(d) Show that there are linear transformations T : Mat (2, 2) — ► Mat (2, 2) 
that cannot be written as L A + Rd- Can you find an explicit one? 

2.6.5 (a) Let V be a vector space. When is a subset W C V a subspace of V? 

(b) Let V be the vector space of C 1 functions on (0, 1). Which of the following 
are subspaces of V : 

0 {/€ V| /(*) = /'(*) + 1}; 

ii) {/€ V | /(*)=*/'(*)}; 

»i) {feV | /(x) = (/'(x)) 2 }. 
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2.6.6 Let V, W C K rt be two subspaces. 

(a) Show that V O W is a subspace of !R n . 

(b) Show that if V U W is a subspace of IR n , then either V c W or W C V. 


2.6.7 Let P 2 be the space of polynomials of degree at most two, identified to 
S 3 via the coefficients; i.e., 


p(x ) = a + bx + cx 2 e P 2 is identified to 



Consider the mapping T : P 2 — < ► P 2 given by 

T[p)(x) = {x 2 4- 1 )p"(x) - xp'(x) + 2 p(x). 


(a) Verify that T is linear, i.e., that T(ap\ + bp 2 ) = aT(p\) + W’(p 2 )- 

(b) Choose the basis of P2 consisting of the polynomials pi(x) = l,p 2 (x) = 
x>P3(x) = x 2 . Denote : U. 3 —> P 2 the corresponding concrete-to-abstract 
linear transformation. Show that the matrix of 

2 0 2 ' 

0 10 . 

0 0 2 

J 

(c) Using the basis 1, x, x 2 , . . . x n , compute the matrices of the same differ- 
ential operator 7\ viewed as an operator from P 3 to P 3 , from P 4 to P 4 , . . . , P n 
to P n (polynomials of degree at most 3, 4, and n). 

2.6.8 Suppose we use the same operator T : P 2 -* P 2 as in Exercise 2 . 6 . 7 , 
but choose instead to work with the basis 

q i(x) = x 2 , q2(x) = x 2 + x, q 3 (x) = i 2 +x + l. 

Now what is the matrix $ ( ”joro$^}? 


*{p) oTo *<p) *• 


Exercises for Section 2.7: 2.7.1 (a) Wliat happens if you compute y/b by Newton’s method, i.e., by 

Newton’s Method setting 

1 ( b\ 

°n+ 1 = 2 y a * + ~ J » starting with ao < 0? 

(b) What happens if you compute Vb by Newton’s method, with b > 0, 
starting with ao < 0? 

2.7.2 Show (a) that the function |x| is Lipschitz with Lipschitz ratio 1 and 
(b) that the function y/\x\ is not Lipschitz. 

2.7.3 (a) Find the formula On+i = p(o„) to compute the fcth root of a number 
by Newton’s method. 
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(b) Interpret this formula as a weighted average. 

2.7.4 (a) Compute by hand the number 9 1//3 to six decimals, rising Newton s 
method, starting at a<) = 2. 

(b) Find the relevant quantities ho.cii, A/ of Kantorovitch s theorem in this 
case. 

(c) Prove that Newton’s method does converge. (You are allowed to use 
Kantorovitch’s theorem, of course.) 

2.7.5 (a) Find a global Lipschitz ratio for the derivative of the mapping F : 
IR 2 — > IR 2 given by 

!?)• 


(b) Do one step of Newton’s method to solve — (o ) 1 s ^ ar ^* n S 

«)• 

(c) Find a disk which you are sure contains a root. 

2.7.6 (a) Find a global Lipschitz ratio for the derivative of the mapping F : 

IR 2 — * K 2 given by 

In Exercise 2.7.7 we advocate 2 

using a program like Matlab F ( *) = ( ' V x ) • 

(Newton.m), but it is not too cum- ' 

bereome for a calculator. 

(b) Do one step of Newton’s method to solve 

F (y)~(o) = (o) starting at (8). 


(c) Can you be sure that Newton’s method converges? 

2.7.7 Consider the system of equations 

cos a: + y = 1.1 
x 4- cos(x -f y) = .9 

(a) Carry out four steps of Newton’s method, starting at (q)* ** ow raan y 
decimals change between the third and the fourth step? 

(b) Are the conditions of Kantorovitch’s theorem satisfied at the first step? 
At the second step? 
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For Exercise 2.7.8, note that 
[2Jf = [8/), i.e., 

’2 0 0l 3 [8 0 O' 

0 2 0 = 0 8 0 . 

0 0 2 0 0 8 


Hint for Exercise 2.7.14 b: This 
is a bit harder than for Newton’s 
method. Consider the intervals 
bounded by a n and 6/a£ -1 , and 
show that they are nested. 

A drawing is recommended for 
part (c), as computing cube roots 
is considerably harder than com- 
puting square roots. 


2.7.8 Using Newton’s method, solve the equation 

T9 0 ll 


A 3 = 0 7 0 . 

0 2 8 _ 

2.7.9 Use the Matlab program Newton.m (or the equivalent) to solve the 
systems of equations: 

x 2 - y + sin(i - y) = 2 _ . / 2 \ /_ 2 \ 


y 2 - x = 3 
x 3 - y + sin(x - y) = 5 


starting at 


©•(i) 


starting at (^) , ( j). 


y‘-x = 3 " ^ 

(a) Does Newton’s method appear to superconverge? 

(b) In all cases, determine the numbers which appear in Kantorovitch’s the- 
orem, and check whether the theorem guarantees convergence. 

2.7.10 Find a number c > 0 such that the set of equations 
x + y 2 — a 

y + z 2 = b has a unique solution near 0 when |a|, |6|, |c| < c. 
z + x 2 — c 

2.7.11 Do one step of Newton’s method to solve the system of equations 

x + cosy - 1.1 = 0 /n \ 


x - sin y + 


.1 = U /Q\ 

^ starting at ao = y q J . 


2.7.12 (a) Write one step of Newton’s method to solve x 5 - x -6 = 0, starting 
at xo = 2. 

(b) Prove that this Newton’s method converges. 

2.7.13 Does a 2 x 2 matrix of the form I + eB have a square root A near 

1 ° 1 ? 

0 -1 ‘ 

J 

2.7.14 (a) Prove that if you compute Vb by Newton’s method, as in Exercise 
2.7.3, choosing oq > 0, then the sequence a n converges to the positive nth root. 

(b) Show that this would still be true if you simply applied a divide and 
average algorithm: 


a " +i ~ K a " + sp) 
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Exercises for Section 2.8: 
Superconvergence 


Hint for Exercise 2.8.5: TVy 
a matrix all of whose entries are 
equal. 


(c) Use Newton's method and “divide and average” (and a calculator or 
computer, of course) to compute v^2, starting at ao = 2. What can you say 
about the speeds of convergence? 

2.8.1 Show (Example 2.8.1) that when solving /(x) = (x - l) 2 = 0 by New- 
ton’s method, starting at ao = 0, the best Lipschitz ratio for / is 2, so 

l/(ao)l|(/'M)~TA/ = l(-i) 2=1 

and Theorem 2.7.11 guarantees that Newton’s method will work, and will con- 
verge to the unique root a = 1. Check that h n — l/2 n+1 so a n = 1 — l/2 n+1 , 
on the nose the rate of convergence advertised. 

2.8.2 (a) Prove (Equation 2.8.12) that the norm of a matrix is at most its 
length: ||A|| < \A\. 

(b) When are they equal? 

2.8.3 Prove that Proposition 1.4.11 is true for the norm ||j 4|| of a matrix A 
as well as for its length \A\: i.e., prove: 

(a) If A is an n x m matrix, and b is a vector in R m , then 

ll^b|| < \\A\\ ||b||. 

(b) If A is an n x m matrix, and Bisamxl: matrix, then 

\\AB\\ < \\M ||B||. 

2.8.4 Prove that the triangle inequality (Theorem 1.4.9) holds for the norm 
|ji4|j of a matrix A, i.e., that for any matrices A and B in R n , 

\\A + #|| < \\A\\ + ||B||. 

2.8.5 (a) Find a 2 x 2 matrix A such that 


(b) Show that when Newton’s method is used to solve the equation above, 
starting at the identity, it converges. 

2.8.6 For what matrices C can you be sure that the equation A 2 + A = C in 
Mat (2, 2) has a solution which can be found starting at 0? At /? 

2.8.7 There are other plausible ways to measure matrices other than the 
length and the norm; for example, we could declare the size \A\ of a matrix A 
to be the absolute value of its largest element. In this case, |A + £| < \A\ + |B|, 
but the statement |Ax| < |A||x| is false. Find an < so that it is false for 



'1 

1 

1 + c 


V 

A = 

0 

0 

0 

, and x = 

1 


0 

0 

0 


0 
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Starred exercises are difficult; 
exercises with two stars are par- 
ticularly challenging. 


** 2 . 8.8 


If A = 


a 

c 


b 

d 


is a 2 x 2 real matrix, show that 



f |A|»+ V|A|<-4D y /2 


where D = ad — be = det A. 


Exercises for Section 2.9: 

Inverse and Implicit 
Function Theorems 


2.9.1 Prove Theorem 2.9.2 (the inverse function theorem in 1 dimension). 

2.9.2 Consider the function 

f § + x 2 sin J if x ^ 0, 

/(*) = < n f n 

t 0 if x = 0, 

discussed in Example 1.9.4. (a) Show that / is differentiable at 0 and that the 
derivative is 1/2. 

(b) Show that / does not have an inverse on any neighborhood of 0. 

(c) Why doesn’t this contradict the inverse function theorem, Theorem 2.9.2? 

2.9.3 (a) See by direct calculation where the equation y 2 + y + 3x + 1 = 0 
defines y implicitly as a function of x . 

(b) Check that your answer agrees with the answer given by the implicit 
function theorem. 

2.9.4 Consider the mapping f : R 2 - ^ q ^ — ► M 2 given by 



Does f have a local inverse at every point of M 2 ? 


2.9.5 Let y(x) be defined implicitly by 

x 2 + y 3 + = 0 . 

Compute y'(x) in terms of x and y. 


2.9.0 (a) TYue or false? The equation sin(zyz) = z expresses x implicitly as 

a differentiable function of y and z near the point 

(TJ 

(b) TYue or false? The equation sin(xyz) = z expresses z implicitly as a 
differentiable function of x and y near the same point. 

2.9.7 Does the system of equations 

x + y + sin (xy) = a 
sin(x 2 + y) = 2a 
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have a solution for sufficiently small a? 

2.9.8 Consider the mapping S : Mat (2, 2) — ► Mat (2, 2) given by S(A) — A 2 . 
Observe that S (— /) = /. Does there exist an inverse mapping g , i.e., a mapping 
such that S((?(.4)) = A, defined in a neighborhood of I. such that g(I) = -/? 


2.9.9 True or false? (Explain your answer.) There exists r > 0 and a differ- 
entiable map 

9 ■ B r ^ q _3 ] ) Mat ( 2 > 2 ) such that 9 ( "o _3 ) = __2 -1 
and (#(,4) ) 2 = A for all A € B r Q 

2.9.10 True or false? If / : Ift 3 —» R is continuously differentiable, and 




a function h of defined near Q), such that / 


(!) 


= 0 . 


2.9.11 (a) Show that the mapping 


(y ) - ( e * /-y ) is locally invertible at every point ^ j 6 ift 2 . 


(b) If F( a) = b, what is the derivative of F~ l at b? 


2.9.12 TYue or false: There exists a neighborhood U C Mat (2, 2) of 
and a C 1 mapping F :U —> Mat (2, 2) with 

(1) F ([o 5]) = [2 

(2) (F(y4)) 2 = A. 

You may use the fact that if S : Mat (2, 2) Mat (2, 2) denotes the squaring 
map S(A) = A 2 , then (DS(.4))B = AB + BA. 




3 

Higher Partial Derivatives, 
Quadratic Forms, and Manifolds 


Thomson [Lord Kelvin] had predicted the problems of the first [ transat- 
lantic I cable by mathematics . On the basis of the same mathematics he 
now promised the company a rate of eight or even 12 words a minute. 
Half a million pounds was being staked on the correctness of a partial 
differential equation. —T.W. Korner, Fourier Analysis 


3.0 Introduction 


When a computer calculates 
sines, it is not looking up the an- 
swer in some mammoth table of 
sines; stored in the computer is a 
polynomial that very well approx- 
imates sin x for x in some particu- 
lar range. Specifically, it uses the 
formula 

sin i = x 4- a 3 X 3 4- a^x 5 -t- (I 7 X 7 

4- a y x 9 4 ai ii 11 4- e(x), 

where the coefficients are 

03 = —.1666666664 

a* = .0083333315 

a 7 = -.0001984090 

a 9 = .0000027526 

a,, = -.0000000239. 

When |x| < 7 t/ 2, the error is guar- 
anteed to be less than 2 x 10 ~ 9 , 
good enough for a calculator which 
computes to eight significant dig- 
its. 


This chapter is something of a grab bag. The various themes are related, but 
the relationship is not immediately apparent. We begin with two sections on 
geometry. In Sect ion 3. 1 we use the implicit function theorem to define just what 
we mean by a smooth curve and a smooth surface. Section 3.2 extends these 
definitions to more general fc-dimensional “surfaces’* in called manifolds: 
surfaces in space (possibly, higher-dimensional space) that locally are graphs of 
differentiable mappings. 

We switch gears in Section 3.3, where we use higher partial derivatives to 
construct the Taylor polynomial of a function in several variables. We saw in 
Section 1.7 how to approximate a nonlinear function by its derivative; here we 
will see that, as in one dimension, we can make higher-degree approximations 
using a function’s Taylor polynomial. This is a useful fact, since polynomials, 
unlike sines, cosines, exponentials, square roots, logarithms, . . . can actually 
be computed using arithmetic. Computing Taylor polynomials by calculating 
higher partial derivatives can be quite unpleasant; in Section 3.4 we give some 
rules for computing them by combining the Taylor polynomials of simpler func- 
tions. 

In Section 3.5 we take a brief detour, introducing quadratic forms, and seeing 
how to classify them according to their “signature.” In Section 3.6 we see 
that if we consider the second degree terms of a function’s Taylor polynomial 
as a quadratic form, the signature of that form usually tells us whether at a 
particular point the function is a minimum, a maximum or some kind of saddle. 
In Section 3.7 we look at extrema of a function / when / is restricted to some 
manifold M C i£ n . 

Finally, in Section 3.8 we give a brief introduction to the vast and important 
subject of the geometry of curves and surfaces. To define curves and surfaces in 
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3 


As familiar as these objects are, 
the mathematical definitions of 
smooth curves and smooth sur- 
faces exclude some objects that we 
ordinarily think of as smooth: a 
figure eight, for example. Nor are 
these familiar objects simple: al- 
ready, the theory of soap bubbles 
is a difficult topic, with a compli- 
cated partial differential equation 
controlling the shape of the film. 


Recall that the graph T(/) of a 
function / : R n —* R: 

r(/) C ir +l 

is the set of pairs (x, y) € M n x IR 
such that f{x) — y. 

Remember from the discussion 
of set theory notation that I x J 
is the set of pairs (x,y) with x € 
/ and y € J: e.g., the shaded 
rectangle of Figure 3.1.1. 
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the beginning of the chapter, we did not need the higher-degree approximations 
provided by Taylor polynomials. To discuss the geometry of curves and surfaces, 
we do need Taylor polynomials: the curvature of a curve or surface depends on 
the quadratic terms of the functions defining it. 

.1 Curves and Surfaces 

Everyone knows what a curve is, until he has studied enough mathematics 
to become confused through the countless number of possible exceptions— 
F. Klein 

We are all familiar with smooth curves and surfaces. Curves are idealizations 
of things like telephone wires or a tangled garden hose. Beautiful surfaces are 
produced when you blow soap bubbles, especially big ones that wobble and 
slowly vibrate as they drift through the air, almost but not quite spherical. 
More prosaic surfaces can be imagined as an infinitely thin inflated inner tube 
(forget the valve), or for that matter the surface of any smooth object. 

In this section we will see how to define these objects mathematically, and 
how to tell whether the locus defined by an equation or set of equations is a 
smooth curve or smooth surface. Wc will cover the same material three times, 
once for curves in the plane (also known as plane curves ), once for surfaces in 
space and once for curves in space. The entire material will be repeated once 
more in Section 3.2 for more general fc-dimensional “surfaces” in IR n . 


Smooth curves in the plane 

When is a subset Xcl 2 a smooth curve? There are many possible answers, 
but today there seems to be a consensus that the objects defined below are the 
right curves to study. Our form of the definition, which depends on the chosen 
coordinates, might not achieve the same consensus: with this definition, it isn’t 
obvious that if you rotate a smooth curve it is still smooth. (We will see in 
Theorem 3.2.8 that it is.) 

Definition 3.1.1 looks more elaborate than it is. It says that a subset X € R 2 
is a smooth curve if X is locally the graph of a differentiable function , either of 
x in terms of y or of y in terms of x; the detail below simply spells out what 
the word “locally” means. Actually, this is the definition of a “C 1 curve” ; as 
discussed in the remark following the definition, for our purposes here we will 
consider C 1 curves to be “smooth.” 

Definition 3.1.1 (Smooth curve in the plane). A subset X C R 2 is a 

C 1 curve if for every point ^ € AT, there exist open neighborhoods I of a 

and J of 6, and either a C 1 mapping / : I -► J or a C 1 mapping g : J — ► I 
(or both) such that X n (J x J) is the graph of / or of g . 
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Note that we do not require 
that the same differentiable map- 
ping work for every point: we can 
switch horses in mid-stream, and 
often we will need to, as in Figure 
3.1.1. 



A function is C 2 (“twice con- 
tinuously differentiable” ) if its first 
and second partial derivatives ex- 
ist and are continuous. It is C 3 if 
its first, second, and third partial 
derivatives exist and are continu- 
ous. 

Some authors use “smooth” to 
mean “infinitely many times dif- 
ferentiable” ; for our purposes, this 
is overkill. 

Exercise 3.1.4 asks you to show 
that every straight line in the 
plane is a smooth curve. 


FIGURE 3.1.1. Above, / and I\ are intervals on the x-axis, while J and J\ are 
intervals on the y-axis. The darkened part of the curve in the shaded rectangle / x J 
is the graph of a function expressing x € / as a function of y € J, and the darkened 
part of the curve in I\ x Jy is the graph of a function expressing y € Ji as a function 
of x € I\. Note that the curve in I\ x J\ can also be thought of as the graph of 
a function expressing x € h as a function of y G Ji. But we cannot think of the 
darkened part of the curve in / x J as the graph of a function expressing y € J as a 
function of x € /; there are values of x that would give two different values of y, so 
such a “function” is not well defined. 

Remark 3.1.2 (Fuzzy definition of “smooth” ) . For the purposes of this 
section, “smooth” means “of class C 1 ” We don’t want to give a precise defi- 
nition of smooth; its meaning depends on context and means “as many times 
differentiable as is relevant to the problem at hand.” In this and the next sec- 
tion, only the first derivatives matter, but later, in Section 3.7 on constrained 
extrema, the curves, surfaces, etc. will need to be twice continuously differen- 
tiable (of class C 2 ), and the curves of Section 3.8 will need to be three times 
continuously differentiable (of class C 3 ). In the section about Taylor polyno- 
mials, it will really matter exactly how many derivatives exist, and there we 
won’t use the word smooth at all. When objects are labeled smooth, we will 
compute derivatives without worrying about whether the derivatives exist. 

Example 3.1.3 (Graph of any smooth function). The graph of any 
smooth function is a smooth curve: for example, the curve of equation y = x 2 , 
which is the graph of y as a function of x, or the curve of equation x — y 2 , 
which is the graph of x as a function of y. 

For the first, for every point ^ ) with y = x 2 , we can take / = R, J = R 
and /(x) = x 2 . A 
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Think of / = (—1,1) as an 
interval on the x-axis, and J — 
(0, 2) as an interval on the y-axis. 

Note that for the upper half circle 
we could not have taken J = R. 
Of course, / does map (-1, 1) — » 
R, but the intersection 

sn(Ki)xffi) 

(where R is the y-axis) is the 
whole circle with the two points 

(o) and ( _ o) 

removed, and not just the graph of 
/, which is just the top half of the 
circle. 



Figure 3.1.2. 

The graph of f(x) — |x| is not 
a smooth curve. 



Figure 3.1.3. 

The graph of f(x) = x 1/3 is 
a smooth curve: although / is 
not differentiable at the origin, the 
function g(y) = y 3 is. 


Example 3.1.4 (Unit circle). A more representative example is the unit 
circle of equation x 2 + y 2 = 1, which we denote S. Here we need the graphs of 
four functions to cover the entire circle: the unit circle is only locally the graph 

of a function. For the upper half of the circle, made up of points with 
y > 0, we can take 

I — (-1, 1), J = (0, 2) and /:/-»./ given by f(x) = \/l - x 2 . 3.1.1 

We could also take J — (0, oo), or J — (0, 1.2), but J — { 0,1) will not do, as 

then J will not contain 1, so the point ( which is in the circle, will not be 

in the graph. Remember that I and J are open. 

Near the point (q)’ § is not the graph of any function / expressing y as 

a function of x, but it is the graph of a function g expressing x a s a func tion 
of y , for example, the function g : (-1, 1) -♦ (0,2) given by x = y/l - y 2 . (In 

this case, J = (— 1, 1) and / = (0,2).) Similarly, near the point ( _ q)’ § is the 

graph of the function g : (-1, 1) -♦ (-2,0) given by x = ~\/\ - y 2 . 

For the lower half of the circle, when y < 0, we can choose / = (— 1,1), J = 
(0,-12), and the function / : I — ► J given by/(x) = — \/l - x 2 . A 

Above, we expressed all but two points of the unit circle as the graph of 
functions of y in terms of x\ we divided the circle into top and bottom. When 
we analyzed the unit circle in Example 2.9.11 we divided the circle into right- 
hand and left-hand sides, expressing all but two (different) points as the graph 
of functions expressing x in terms of y. In both cases we use the same four 
functions and we can use the same choices of I and J. 

Example 3.1.5 (Graphs that are not smooth curves). The graph of the 
function / : R — ♦ R, f(x) — |x|, shown in Figure 3.1.2, is not a smooth curve; it 
is the graph of the function / expressing y as a function of x, of course, but / is 
not differentiable. Nor is it the graph of a function g expressing x as a function 

of y, since in a neighborhood of the same value of y gives two values of x. 

The set X C R 2 of equation xy = 0 (i.e., the union of the two axes) is also 

not a smooth curve; in any neighborhood of ( jj), there are infinitely many y’s 

corresponding to x = 0, and infinitely many x’s corresponding to y = 0, so it 
isn’t a graph of a function either way. 

In contrast, the graph of the function f(x ) = x 1 / 3 , shown in Figure 3.1.3, is 
a smooth curve; / is not differentiable at the origin, but the curve is the graph 
of the function x = y 3 , which is differentiable. 

Example 3.1.6 (A smooth curve can be disconnected). The union X of 
the x and y axes, shown on the left in Figure 3.1.4, is not a smooth curve, but 

^ ~ ^ ( 0 ) ^ ^ sa smo °fh curve — even though it consists of four distinct pieces. 
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Tangent lines and tangent space 


Figure 3.1.4. 

Left: The graph of the x and y 
axes is not a smooth curve. Right: 
The graph of the axes minus the 
origin is a smooth curve. 


Definition 3.1.7 (Tangent line to a smooth plane curve). The tangent 
line to a smooth plane curve C at a point ^ f(a) ^ is the line of equation 

y - f(a) = f'(a)(x — a). The tangent line to C at a point ( ^ is the line 

of equation x - g(b) = gf(b)(y - 6). 

You should recognize this as saying that the slope of the graph of / is given 
by /'. 

At a point where the curve is neither vertical nor horizontal, it can be thought 
of locally as either a graph of x as a function of y or as a graph of y as a function 
of x. Will this give us two different tangent lines? No. If we have a point 

(?) = (/(«)) -W) 6 c ’ 3U 



A 


A 


/! 


/ 


//a. 

/? / 
/;/ 

//>' 


/ 


/ 

4 

Figure 3.1.5. 


Top: The tangent line. Middle: 
the tangent space. Bottom: The 
tangent space at the tangent point 
and translated to the origin. 


where C is a graph of f : I —> J and g : J -+ /, then g o f(x) = x (i.e., 
g(f(x)) = x). In particular, g'(b)f'(a) = 1 by the chain rule, so the line of 
equation y-f(a) = /'(a)(x— a) is also the line of equation x-g(b) = g’(b)(y—b), 
and our definition of the tangent line is consistent. 1 

Very often the interesting thing to consider is not the tangent line but the 
tangent vectors at a point. Imagine that the curve is a hill down which you are 
skiing or sledding. At any particular moment, you would be interested in the 
slope of the tangent line to the curve: how steep is the hill? But you would also 
be interested in how fast you are going. Mathematically, we would represent 
your speed at a point a by a velocity vector lying on the tangent line to the 
curve at a. The arrow of the velocity vector would indicate what direction you 
are skiing, and its length would say how fast. If you are going very fast, the 
velocity vector will be long; if you have come to a halt while trying to get up 
nerve to proceed, the velocity vector will be the zero vector. 

The tangent space to a smooth curve at a is the collection of vectors of all 
possible lengths, anchored at a and lying on the tangent line, as shown at the 
middle of Figure 3.1.5. 

Definition 3.1.8 (Tangent space to a smooth curve). The tangent 
space to C at a, denoted T m C, is the set of vectors tangent to C at a: i.e., 
vectors from the point of tangency to a point of the tangent line. 


1 Since ^'(6)/'(a) = 1, we have /'(a) = l/$'(6), so y - /(a) = f'(a)(x - a) can be 
written 

. x — a x — of 6) 

= 7W = ~7(b) ’ i€ ’ x ~S(l>) = 9'(b)( y -b). 
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!/ 

/ 


Figure 3.1.6. 


The unit circle with tangent 

spaces at and at 

The two tangent spaces are the 
same; they consist of vectors such 
that the increment in the x direc- 
tion is 0. They can be denoted 
x = 0, where x denotes the first 



entry of the vector 



; it is not 


a coordinate of a point in the tan- 
gent line. 


The tangent space will be es- 
sential in the discussion of con- 
strained extrema, in Section 3.7, 
and in the discussion of orienta- 
tion, in Section 6.5. 

Note that a function of the 
form F ( * J = c is of a different 

species than the functions / and 
9 used to define a smooth curve; 
it is a function of two variables, 
while / and g are functions of one 
variable. If / is a function of one 
variable, its graph is the smooth 
curve of equation f(x) — y = 0. 
Then the curve is also given by 

the equation F ^ J ^ =0, where 

r(* y )=n*)-y. 


The vectors making up the tangent space represent increments to the point a; 
they include the zero vector representing a zero increment. The tangent space 
can be freely translated, as shown at the bottom of Figure 3.1.5: an increment 
has meaning independent of its location in the plane, or in space. Often we 
make use of such translations when describing a tangent space by an equation. 
In Figure 3.1.6, the tangent space to the circle at the point where x = 1 is 
the same as the tangent space to the circle where x = —1; this tangent space 
consists of vectors with no increment in the x direction. (But the equation for 
the tangent line at the point where x = 1 is x — 1, and the equation for the 
tangent line at the point where x = —1 is x = —1; the tangent line is made 
of points, not vectors, and points have a definite location.) To distinguish the 
tangent space from the line x = 0, we will say that the equation for the tangent 
space in Figure 3.1.6 is x = 0. (This use of a dot above a variable is consistent 
with the use of dots by physicists to denote increments.) 

Level sets as smooth curves 

Graphs of smooth functions are the “obvious” examples of smooth curves. Very 
often, the locus (set of points) we are asked to consider is not the graph of any 
function we can write down explicitly. We can still determine whether such a 
locus is a smooth curve. 

Suppose a locus is defined by an equation of the form F ( J ) = c, such as 
z 2 ~~ — y 2 = —2. One way to imagine this locus is to think of cutting the 

graph °f F ( y ) = x 2 - 2x 4 - y 2 by the plane z = -2. The intersection of the 
graph and the plane is called a level curve ; three such intersections, for different 
values of z, are shown in Figure 3.1.7. How can we tell whether such a level set 
is a smooth curve? We will see that the implicit function theorem is the right 
tool to handle this question. 

Theorem 3.1.9 (Equations for a smooth curve In R 2 ). fa) IfU is open 
in R 2 , F : U — ► R is a differentiable function with Lipecbitz derivative, and 

X c ={x€U | F(x) = c}, then X c is a smooth carve in R 2 if [DF(a)l is onto 
for alla€ X c ; i.e., if 

[ DF (&)]^° a=(g)e.Xc. 3.1.3 

(b) If Equation 3.1.3 is satisfied, then the tangent space to X c at a is 
ker[D.F(a)]: 

T m X c = ter(DF(«)l. 
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The condition that (DF(a)] be 
onto is the crucial condition of the 
implicit function theorem. 

Because (DF(a)] is a 1 x 2 ma- 
trix (a transformation from St 2 to 
M), the following statements mean 
the same thing: 

for all a = e ^ c ’ 

(1) [DF(a)] is onto. 

(2) [DF(a)] ?£ 0. 

(3) At least one of DiF(a) or 
D 2 F(a) is not 0. 

Note that 

(DF(a)) = [D,F(a) 1 D 2 F(a)); 
saying that (DF(a)] is onto is say- 
ing that any real number can be 
expressed as a linear combination 
DiF(a)a + £> 2 F(a)/? for some 




Figure 3.1.7. The surface F ^ j ~ x 2 - .2x 4 - y 2 sliced horizontally by setting 

z equal to three different constants. The intersection of the surface and the plane 
z = c used to slice it is known as a level set. (This intersection is of course the same 

as the locus of equation F ^ = c.) The three level sets shown above are smooth 

curves. If we were to “slice” the surface at a maximum of F, we would get a point, 
not a smooth curve. If we were to slice it at a saddle point (also a point where the 
derivative of F is 0), we would get a figure eight, not a smooth curve. 


Part (b) of Theorem 3.1.9 re- 
lates the algebraic notion of 
ker[DF(a)| to the geometrical no- 
tion of a tangent space 

Saying that ker(DF(a)] is the 
tangent space to X c at a says that 
every vector v tangent to X c at a 
satisfies the equation 

(DF(a)]v = 0. 

This puzzled one student, who ar- 
gued that for this equation to be 
true, either (DF(a)] or v must be 
0, yet Equation 3.1.3 says that 
[DF(a)J ^ 0. This is forgetting 
that (DF(a)] is a matrix. For ex- 
ample: if (DF(a)j is the line mar 

trix (2 , -2j, then (2 , -2J f j j =0. 


Example 3.1.10 (Finding the tangent space). We have no idea what the 
locus X c defined by x 9 + 2z 3 + y + y 5 = c looks like, but the derivative of the 

function F ^ * J =* x 9 + 2x 3 + y + y 5 is 



D\F D 2 F 


3.1.4 


which is never 0, so Xc is a smooth curve for all c. At the point 

derivative [^-^(y)] ^ [15, 6], so the equation of the tangent 
that point is 15z -f- 6y = 0. A 


(}) €* 5 , the 
space to X*, at 


Proof of Theorem 3.1.9. (a) Choose a = e I The hypothesis 

[DF(a)] 0 implies that at least one of D\F or D%F ^ ^ is not 0; let 

us suppose D 2 F ( J ) # A (i.e., the second variable, y, is the pivotal variable, 
which will be expressed as a function of the non-pivotal variable x). 
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Note that the derivative of the 
implicit function, in this case /', is 

evaluated at <i, not at a = 



This is what is needed in order to apply the short version of the implicit 
function theorem (Theorem 2.9.9): F (jj) -= 0 then expresses y implicitly as a 
function of x in a neighborhood of a. 

More precisely, there exists a neighborhood U of a in IK. a neighborhood V of 
6, and a continuously differentiable mapping f : l T -> V such that F ( ) = 

0 for all x E U. The implicit function theorem also guarantees that we can 
choose U and V so that when x is chosen in U> then f{x) is the only y e V 
such that F ( * ) =0. In other words, X fl (U x V) is exactly the graph of /, 
which is our definition of a curve. 

(b) Now we need to prove that the tangent space T a X c is ker(DF(a)). For 
this we need the formula for the derivative of the implicit function, in Theorem 
2.9.10 (the long version of the implicit function theorem). Let us suppose that 
D 2 F(a) -f- 0, so that, as above, the curve has the equation y — f(x) near 
a = ( ^ , and its tangent space has equation y = f'(a)x. 

The implicit function theorem (Equation 2.9.25) says that the derivative of 
the implicit function / is 

/'(cl) = [D/(a) ] = -D 2 F(a)' l D l F(a). 3.1.5 


Substituting this value for f(a ) in the equation y = f(a)x, we get 


y = -D 2 F(a)~ l D l F(a)x 


3.1.6 


Multiplying through by D 2 F( a) gives D 2 F(a)y = -D\F(a)x, so 


If you know a curve as a graph, 
this procedure will give you the 
tangent space as a graph. If you 
know it as an equation, it will 
give you an equation for the tan- 
gent space. If you know it by a 
parametrizatiou, it will give you 
a parametrization for the tangent 
space. 

The same rule applies to sur- 
faces and higher-dimensional man- 
ifolds. 


0 = D\F(a)x + D 2 F(a)y = (DiF(a), D 2 F( a)] 

' V 

(DF(a)l 


X 

y 


□ 


3.1.7 


Remark. Part (b) is one instance of the golden rule: to find the tangent space 

x 


to a curve, do unto the increment 


y 


with the derivative whatever you did to 


points with the function to get your cxirve. For instance; 

• If the curve is the graph of /, i.e, has equation y = /(x), the tangent space 

at ) is the graph of /'(a), i.e. has equation y = f(a)x. 

• If the curve has equation F ^ ^ =0, then the tangent space at has 


equation 


DF ( *°)1 


X 

\ .Vo ) : 


y . 


= 0 . 
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✓ 


► 


Figure 3.1.8. 


Why? The result of “do unto the increment ...” will be the best linear 
approximation to the locus defined by “whatever you did to points ...” A 


Exam ple 3.1.11 (When is a level set a smooth curve?). Consider the 
function F ( * ) = x 4 + y* + x 2 - y 2 - We have 

[df(* )] = [4s 3 + 2x , 4y 3 - 2yJ = [2x(2x 2 + 1), 2y(2y 2 - 1)]. 3.1.8 

DiF D 2 F 


The locus of equation x 4 + y 4 + 
x 2 — y 2 — —1/4 consists of the two 
points at ±l/\/2 on the y-axis; it 
is not a smooth curve. Nor is the 
figure eight, which is the locus of 
equation x 4 +y 4 +x 2 - y 2 = 0. The 
other curves are smooth curves. 
The arrows on the lines are an 
artifact of the drawing program. 


There are no real solutions to 2x 2 + 1 = 0; the only places where both partials 
vanish are 

319 

where F takes on the value 0 and -1/4. Thus for any number c ^ 0 and 
c ± -1/4, the locus of equation c = x 4 + y 4 + x 2 - y 2 is a smooth curve. 

Some examples are plotted in Figure 3.1.8. Indeed, the locus of equation 
x 4 -f y 4 + x 2 - y 2 = -1/4 consists of precisely two points, and is nothing you 
would want to call a curve, while the locus of equation x 4 + y 4 + x 2 — y 2 = 0 is 
a figure eight, and near the origin looks like two intersecting lines; to make it 
a smooth curve we would have to take out the point where the lines intersect. 
The others really are things one would want to call smooth curves. 


Smooth surfaces in R 3 


“Smooth curve” means some- 
thing different in mathematics and 
in common speech: a figure eight 
is not a smooth curve, while the 
four separate straight lines of Ex- 
ample 3.1.6 form a smooth curve. 
In addition, by our definition the 
empty set (which arises in Exam- 
ple 3.1.11 if c < -1/4) is also a 
smooth curve! Allowing the empty 
set to be a smooth curve makes a 
number of statements simpler. 


Our definition of a smooth surface in R 3 is a clone of the definition of a curve. 


Definition 3.1.12 (Smooth surface). A subset S C R 3 is a smooth 

M 

surface if for every point a = I b I € S, there are neighborhoods I of o, J 

of b and K of c, and either a differentiable mapping 

• / : I x J — ► K , i.e., z as a function of (x, y) or 

• g : I x K — * J, i.e., y as a function of (x, z ) or 

• h: J x K — ► /, i.e., x as a function of (y, 2 ), 


such that X C\ (I x J x K) is the graph of /, y, or h. 
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We will see in Proposition 3.2.8 
that the choice of coordinates 
doesn’t matter; if you rotate a 
smooth surface in any way, it is 
still a smooth surface. 


If at a point xo the surface is 
simultaneously the graph of z as a 
function of x and y, y as a function 
of x and z , and x as a function 
of y and z, then the corresponding 
equations for the tangent planes to 
the surface at xo denote the same 
plane, as you are asked to show in 
Exercise 3.1.9. 


Definition 3.1.13 (Tangent plane to a smooth surface). The tangent 


a 


plane to a smooth surface S at I 6 ] is the plane of the equations 


•—MS))!;:? 

in the three cases above. 


= D i f(b)(x-a) + D 2 f(b)(y-b) 
= Di (£) (x-a) + D 2 g (“) 0 - c ) 
= D,h ( b c ) (v - b) + D 2 h (*) (z - c) 


3.1.10 


As in the case of curves, we will distinguish between the tangent plane, given 
above, and the tangent space. 

Definition 3.1.14 (Ihngent space to a smooth surface). The tangent 
space to a smooth surface 5 at a is the plane composed of the vectors tangent 
to the surface at a, i.e., vectors going from the point of tangency a to a point 
of the tangent plane. It is denoted T H S. 


As before, x denotes an incre- 
ment in the x direction, y an in- 
crement in the y direction, and so 


chored at a, the vector 


increment from the point 


The equation for the tangent space to a surface is: 


ace is an- 
±" 

y is an 

*“[*>/(?)] [J] + 




3.1.11 

[J 




Example 3.1.15 (Sphere in IR 3 ). Consider the unit sphere: the set 



such that x 2 + y 2 + z 2 = 1 ► . 

> 


This is a smooth surface. Let 




such that x 2 + y 2 < 1, 2 = 0 


3.1.12 


3.1.13 
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Many students find it very hard 
to call the sphere of equation 

x 2 + y 2 + z 2 = 1 

two-dimensional. But when we 
say that Chicago is “at” x latitude 
and y longitude, we are treating 
the surface of the earth as two- 
dimensional. 


In Theorem 3.1.16 we could say 
“if [DF(a)j is onto, then X is a 
smooth surface.” Since F goes 
from U C l 3 to R, the derivative 
[DF(a)J is a row matrix with three 
entries, DjF, D 2 F, and D3F. The 
only way it can fail to be onto is if 
all three entries are 0. 


You should be impressed by 
Example 3.1.17. The implicit 
function theorem is hard to prove, 
but the work pays off. With- 
out having any idea what the set 
defined by Equation 3.1.16 might 
look like, we were able to deter- 
mine, with hardly any effort, that 
it is a smooth surface. Figuring 
out what the surface looks like — 
or even whether the set is empty — 
is another matter. Exercise 3.1.15 
outlines what it looks like in this 
case, but usually this kind of thing 
can be quite hard indeed. 


be the unit disk in the (x, 2/)- plane, and R+ the positive part of the 2-axis. 
Then 

S 2 n(U x , y xR+) 3.1.14 

is the graph of the function U XiV — * Rf given by \/l - x 2 - y 2 . 

This shows that S 2 is a surface near every point where z > 0, and considering 
— \/\ — x 1 — y 2 should convince you that S 2 is also a smooth surface near any 
point where z < 0. 

In the case where z = 0, we can consider 

(1) \J x ,z &nd Uy,z\ 

(2) the half-axes R+ and R+ ; and 

(3) the mappings ±\/l — x 2 — z 2 and ±\/l — y 2 - z 2 , 

as Exercise 3.1.5 asks you to do. A 

Most often, surfaces are defined by an equation like x 2 + y 2 4- z 2 = 1, which 
is probably familiar, or sin (x 4- yz) = 0, which is surely not. That the first is a 
surface won’t surprise anyone, but what about the second? Again, the implicit 
function theorem comes to the rescue, showing how to determine whether a 
given locus is a smooth surface. 


Theorem 3.1.16 (Smooth surface in R 3 ). (a) Let U be an open subset 
0 /R 3 , F : U — ♦ R a differentiable function with Lipschitz derivative and 



6l 3 | F(x) = 0 j . 


3.1.15 


If at every a € X we have [DF(a)] ^ 0, then X is a smooth surface, 
(b) The tangent space T m X to the smooth surface is ker[DF(a)]. 


Example 3.1.17 (Smooth surface in R 3 ). Consider the set X defined by 
the equation 


F 



= sin(x -(- yz) = 0. 


3.1.16 


The derivative is 



On X , by definition, sin(a + 6c) = 0, so cos(a + 6c) # 0, so X is a smooth 
surface. A 
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Proof of Theorem 3.1.16. Again, this is an application of the implicit 
function theorem. If for instance D\F( a) ^ 0 at some point afX, then the 
condition F(x) — 0 locally expresses x as a function h of y and 2 (see Definition 
3.1.12). This proves (a). 

For part (b), recall Definition 3.1.11, which says that in this case the tangent 
space T a X has equation 



But the implicit function theorem says that 

K*)] = -[0i^(a)]' 1 [0 2 F(a),iV'(a)]. 


3.1.18 


3.1.19 


(Can you explain how Equation 3.1.19 follows from the implicit function 
theorem? Check voiir answer below. 2 ) 

Substituting this value for in Equation 3.1.18 gives 


i=-[D!F(a)] 1 [D 2 F(a),D 3 F(a)j 
and multiplying through by DiF(a), we get 


y 

z 


3.1.20 


/ 


[£>,F(a)]x = - iDiFfa^DjFfa)] 1 [D 2 F(a), D 3 F(a)] 


y 

Z 


, so 3.1.21 


[D 2 F(a),D 3 F(a)] 


y 

z 


+ [DiF(a)]x = 0; i.e., 


(D 1 F(a),D 2 F(a),D 3 F(a)] 

[DF( a)J 

So the tangent space is the kernel of (DF(a)]. □ 


•> m 

X 

y 

= 0, 

or jDF(a)] 

X 

y 

z 

— — 



z 


3.1.22 


= 0 


2 Recall Equation 2.9.25 for the derivative of the implicit function: 


(Dg(b)j = -[dTf(c), D^c ))- 1 [D n+1 F(c), . . . , D n + m F(c)\. 


partial deriv. for 
pivotal variables 


partial deriv. for 
non-pivotal variables 


Our assumption was that at some point a € X the equation F(x) = 0 locally expresses 
x as a function of y and x. I„ Equation 3.1.19 D,F( a) is the partial derivative with 
respect to the pivotal variable, while D,F( a) and D 3 F( a) are the partial derivatives 
with respect to the non~pivotal variables. 
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For smooth curves in R 2 or 
smooth surfaces in IR 3 , we always 
had one variable expressed as a 
function of the other variable or 
variables. Now we have two vari- 
ables expressed as a function of the 
other variable. 

This means that curves in space 
have two degrees of freedom, as 
opposed to one for curves in the 
plane and surfaces in space; they 
have more freedom to wiggle and 
get tangled. A sheet can get a lit- 
tle tangled in a washing machine, 
but if you put a ball of string in 
the washing machine you will have 
a fantastic mess. Think too of tan- 
gled hair. That is the natural state 
of curves in R 3 . 

Note that our functions f,g, 
and k are bold. The function f, 
for example, is 



Smooth curves in M 3 

A subset X C R 3 is a smooth curve if it is locally the graph of either 

• y and z as functions of x or 

• x and z as functions of y or 

• x and y as functions of z. 

Let us spell out the meaning of “locally.” 


Definition 3.1.18 (Smooth curve in IR 3 ). A subset Xd 3 


curve if for every a = ^6 J € X, there exist neighborhoods I 
and K of c, and a differentiable mapping 


is a smooth 
of a, J of 6 


• f : / — ► J x K y i.e., y, z as a function of x or 

• g : J — ► / x K, i.e., x, z as a function of y or 

• k : K —> I x J, i.e., x,y as a function of z , 


such that X n ( J x J x K) is the graph of f , g or k respectively. 


a 


If y and z are functions of x , then the tangent line to X at [ 6 

c 

\ 

intersection of the two planes 


is the line 


y ~ b — /[(a) (z - a) and z - c = - a). 3.1.23 


What are the equations if x and z axe functions of y? If x and y are functions 
of z ? Check your answers below. 3 

The tangent space is the subspace given by the same equations, where the 
increment x - a is written x and similarly y - b = y, and z - c = i. What are 
the relevant equations? 4 


3 If x and z axe functions of y, the tangent line is the intersection of the planes 
x - a = g\(b) (y - b) and z - c- g?{b)(y - b). 

If x and y are functions of z, it is the intersection of the planes 

x — a — k\ (c) (z - c) and y - b = k' 2 (c)(z - c ). 


4 


The tangent space can be written as 

9imy 

92(b)(y). 



M = f/((o)(i)] 

UJ l/2(a)(x)J 

* ra-m 


or 


(c) (Z) 
(c)(i) J 
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Since the range of [DF(a)j is 
R 2 , saying that it has rank 2 is 
the same as saying that it is onto; 
both are ways of saying that its 
columns span IR 2 . 


In Equation 3.1.24, the partial 
derivatives on the right-hand side 

a 

are evaluated at a = b . The 

L°. 

derivative of the implicit function 
k is evaluated at c; it is a func- 
tion of one variable, z, and is not 
defined at a. 


Here [DF(a)j is a 2 x 3 matrix, 
so the partial derivatives are vec- 
tors, not numbers; because they 
are vectors we^write them with ar- 
rows, as in DiF(a). 

Once again, we distinguish be- 
tween the tangent line and the 
tangent space, which is the set of 
vectors from the point of tangency 
to a point of the tangent line. 


This should look familiar; we 
did the same thing in Equations 
3.1.20-3.1.22. 


Proposition 3.1.19 says that another natural way to think of a smooth curve 
in R 3 is as the intersection of two surfaces. If the surfaces S\ and S 2 are given 
by equations f\ (x) = 0 and / 2 (x) = 0, then C = S\ 052 is given by the equation 


F(x) = 0, where F(x) = (^(xj ) is a m &PP in g from 


... 

R* 


W 


Below we speak of the derivative having rank 2 instead of the derivative 
being onto; as the margin note explains, in this case the two mean the same 
thing. 


Proposition 3.1.19 (Smooth curves in R 3 ). (a) Let U C R 3 be open, 

F : U —> R 2 be differentiable with Lipscbitz derivative, and let C be the set 
of equation F(x) = 0. If [DF(a)] has rank 2 for every a € C, then C is a 
smooth curve in R 3 . 

(b) The tangent vector space to X at a is ker[DF(a)]. 


Proof. Once more, this is the implicit function theorem. Let a be a point of 
C. Since [DF(a)] is a 2 x 3 matrix with rank 2, it has two columns that are 
linearly independent. By changing the names of the variables, we may assume 
that they are the first two. Then the implicit function theorem asserts that near 
a, x and y are expressed implicitly as functions of z by the relation F(x) = 0. 

The implicit function theorem further tells us (Equation 2.9.25) that the 
derivative of the implicit function k is 


(Dk(c)] = -[ £»>(a), D^F (a)) ~ 1 [£^F(a)[ . 3.1.24 

partial deriv. for for non- 

pivotal variables pivotal 

variable 


We saw (footnote 4) that the tangent space is the subspace of equation 


X 


k\ (c)z 

y. 


k' 2 (c)z 


[Dk(c)]i, 


3.1.25 


where once more x,y and i are increments to x,y and z. Inserting the value 

of [Dk(c)j from Equation 3.1.24 and multiplying through by [£,F(a) , £> 2 F(a)] 
gives 


(Z)iF(a), D 2 F(a)] [DiFfa), Z^F(a)]- 1 [D7F(a))i = [£>7F(a), £?F(a)| 


x 

y 


so 


0 = (£>7F(a), D^Ffa), D>(a)) 

Sl V ■ 

(DFfa)] 


X 

y 

; i.e., [DF(a)J 

x' 

y 

z 


m z 


= 0. □ 


3.1.26 
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In Equation 3.1.27 we parame- 
trize the surface by the variables 
x and y. But another part of the 
surface may be the graph of a func- 
tion expressing x as a function of 
y and z\ we would then be locally 
parametrizing the surface by the 
variables y and z. 


i 



l 

i 

Figure 3.1.9. 

A curve in the plane, known by 
the parametrization 

1 ( * 2 - sin t \ 
^6sin£cost J ' 


Parametrizations of curves and surfaces 

We can think of curves and surfaces as being defined by equations, but there 
is another way to think of them (and of the higher-dimensional analogs we 
will encounter in Section 3.2): parametrizations. Actually, local parametriza- 
tions have been built into our definitions of curves and surfaces. Locally, as 
we have defined them, smooth curves and surfaces come provided both with 
equations and parametrizations. The graph of / (;)» both the locus of equa- 
tion z = / ( y ) (expressing z as a function of x and y) and the image of the 
parametrization 



3.1.27 


How would you interpret Example 3.1.4 (the unit circle) in terms of local 
parametrizations? 5 

Global parametrizations really represent a different way of thinking. 

The first thing to know about parametrizations is that practically any map- 
ping is a ‘"parametrization” of something. 


The second thing to know about parametrizations is that trying to find 
a global parametrization for a curve or surface that you know by equations 
(or even worse, by a picture on a computer monitor) is very hardy and often 
impossible. There is no general rule for solving such problems. 


By the first statement we mean that if you fill in the blanks of t 


(:) 


where - represents a function of t (f 3 , sinf, whatever) and ask a computer 
to plot it, it will draw you something that looks like a curve in the plane. If 
you happen to choose t i~* it will draw you a circle; t ►-» 

parametrizes the circle. If you choose t -» ( f si ~ t s ^ t ) , you will get the curve 
shown in Figure 3.1.9. 


In Example 3.1.4, where the unit circle x 2 + y 2 — 1 is composed of points f x V 

we parametrized the top and bottom of the unit circle (y > 0 and y < 0) by x: we 
expressed the pivotal varia ble y a s a function of the non-pivotal variable x using 
the functions y = /(*) = >/l - and y = f(x) = -vT^. In the neighborhood 

of the points (q) and (~JJ we parametrized the circle by y: we expressed the 

pivotal variable i as a function of the non-piv otal variable y, using the functions 
x ~ f{y) = Vl - y 2 and x = f{y) = -^1 - y 2 
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Figure 3.1.10. 


A curve in space, known by the 

cos t 

f 

parametrized ion t 


If you choose three functions of t, the computer will draw something that 

( cos t \ 

sin t 1 , you’ll get 
at ) 

the helix shown in Figure 3.1.10. 

If you fill in the blanks of(^)>-> ^ ^ , where - represents a function of 

u and v (for example, sin 2 u cos t>, for some such thing) the computer will draw 
you a surface in £. 3 . The most famous parametrization of surfaces parametrizes 
the unit sphere in IR 3 by latitude u and longitude v: 



Figure 3.1.11. 



cos u cos v 
cos u sin v 
sin u 


3.1.28 


But virtually whatever you type in, the computer will draw you something. For 

/ u 3 cos v \ 

example, if you type in f M | u 2 + v 2 ] , you will get the surface shown in 

\ v 2 cos u ) 

Figure 3.1.11. 

How does the computer do it? It plugs some numbers into the formulas to 
find points of the curve or surface, and then it connects up the dots. Finding 
points on a curve or surface that you know by a parametrization is easy. 

But the curves or surfaces we get by such “parametrizations” are not nec- 
essarily smooth curves or surfaces. If you typed random parametrizations into 
a computer (as we hope you did), you will have noticed that often what you 
get is not a smooth curve or surface; the curve or surface may intersect itself, 
as shown in Figures 3.1.9 and 3.1.11. If we want to define parametrizations of 
smooth curves and surfaces, we must be more demanding. 


In Definition 3.1.20 we could 
write m (D 7(<)] is one to one” in- 
stead of “f (t) ^ 0” ; Y(t) and 
(D7(<)] are the same column ma- 
trix, and the linear transformation 
given by the matrix [D7(t)j is one 
to one exactly when ^(t) ^ 0. 

Recall that 7 is pronounced 
“gamma.” 

We could replace “one to one 
and onto” by “bijective.” 


Definition 3.1.20 (Parametrization of a curve). A parametrization of 
a smooth curve C € R n is a mapping 7 : / — ► C satisfying the following 
conditions: 


(1) / is an open interval of R. 

(2) 7 is C\ one to one, and onto 

(3) 7 '(<) ^ 0 for every t € I. 


Think of / as an interval of time; if you are traveling along the curve, the 
parametrization tells you where you are on the curve at a given time, as shown 
in Figure 3.1.12. 
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In the case of surfaces, saying 
that [D7(u)j is one to one is the 
same as saying that the two partial 
derivatives £>17, £>27 are linearly 
independent. (Recall that the ker- 
nel of a linear transformation rep- 
resented by a matrix is 0 if and 
only if its columns are linearly in- 
dependent; it takes two linearly in- 
dependent vectors to span a plane, 
in this cast; the tangent plane. 

In the case of the parametriza- 
tion of a curve (Definition 3.1.20), 
the requirement that 7 (£) ^ 0 
could also be stated in these terms: 
for one vector, being linearly inde- 
pendent means not being 0. 



FIGURE 3.1.12. We imagine a parametrized curve as an ant taking a walk in the 
plane or in space. The parametrization tells where the ant is at any particular time. 


Definition 3.1.21 (Parametrization of a surface). A parametrization 
of a surface S € R 3 is a smooth mapping 7 : U —* S such that 

(1) U C R 2 is open. 

( 2 ) 7 is one to one and onto. 

(3) [D 7 (u)] is one to one for every u € U. 


The parametrization 

t ( cos A 

1 \ sin t ) ’ 

which parametrizes the circle, is 
of course not one to one, but its 
restriction to (0, 2tt) is; unfortu- 
nately, this restriction misses the 



It is generally far easier to get 
a picture of a curve or surface if 
you know it by a parametrization 
than if you know it by equations. 
In the case of the curve whose 
parametrization is given in Equa- 
tion 3.1.29, it will take a computer 
milliseconds to compute the coor- 
dinates of enough points to give 
you a good picture of the curve. 


It is rare to find a mapping 7 that meets the criteria for a parametrization 
given by Definitions 3.1.20 and 3.1.21, and which parametrizes the entire curve 
or surface. A circle is not like an open interval: if you bend a strip of tubing 
into a circle, the two endpoints become a single point. A cylinder is not like an 
open subspace of the plane: if you roll up a piece of paper into a cylinder, two 
edges become a single line. Neither parametrization is one to one. 

The sphere is similar. The parametrization by latitude and longitude (Equa- 
tion 3.1.28) satisfies our definition only if we remove the curve going from the 
North Pole to the South Pole through Greenwich (for example). 

Example 3.1.22 (Parametrizations vs. equations). If you know a curve 
by a global parametrization, it is easy to find points of the curve, but difficult 
to check whether a given point is on the curve. The opposite is true if you 
know the curve by an equation: then it may well be difficult to find points of 
the curve, but checking whether a point is on the curve is straightforward. For 
example, given the parametrization 

^ . ( cos 3 £ - 3 sin £ cos A „ „ 

J> 31-29 

you can find a point by substituting some value of t y like t = 0 or t = 1 . But 
checking whether some particular point ^ ^ ^ is on the curve would be very 
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difficult. That would require showing that the set of nonlinear equations 

a = cos 3 1 - 3 sin t cos t 
b = t 2 - t 5 


has a solution. 

Now suppose you are given the equation 

y ■+■ sin xy + cos(x + y) = 0 , 3.1.31 

which defines a different curve. It’s not clear how you would go about finding a 
point of the curve. But you could check whether a given point is on the curve 
simply by inserting the values for x and y in the equation . 6 A 

Remark. It is not true that if 7 : I -* C is a smooth mapping satisfying 
y(£) ^ 0 for every £, then C is necessarily a smooth curve. Nor is it true 
that if 7 : U — * S is a smooth mapping such that [D 7 (u)] is one to one, then 
necessarily S is a smooth surface. This is true only locally: if I and U are 
small enough, then the image of the corresponding 7 will be a smooth curve or 
smooth surface. A sketch of how to prove this is given in Exercise 3.1.20. A 


3.2 Manifolds 


A mathematician trying to pic- 
ture a manifold is rather like a 
blindfolded person who has never 
met or seen a picture of an ele- 
phant seeking to identify one by 
patting first an ear, then the trunk 
or a leg. 


In Section 3.1 we explored smooth curves and surfaces. We saw that a subset 
X € M 2 is a smooth curve if X is locally the graph of a differentiable function, 
either of x in terms of y or of y in terms of x. We saw that S C IR 3 is a smooth 
surface if it is locally the graph of a differentiable function of one coordinate 
in terms of the other two. Often, we saw, a patchwork of graphs of function is 
required to express a curve or a surface. 

This generalizes nicely to higher dimensions. You may not be able to visualize 
a five-dimensional manifold (we can’t either), but you should be able to guess 
how we will determine whether some five-dimensional subset of IR n is a manifold: 
given a subset of M n defined by equations, we use the implicit function theorem 


6 You might think, why not use Newton’s method to find a point of the curve given 
by Equation 3.1.31? But Newton’s method requires that you know a point of the 
curve to start out. What we could do is wonder whether the curve crosses the y-axis. 
That means setting x = 0, which gives y + cosy = 0. This certainly has a solution by 
the intermediate value theorem: y + cosy is positive when y > 1, and negative when 
y < -1. So you might think that using Newton’s method starting at y = 0 should 
converge to a root. In fact, the inequality of Kantorovitch’s theorem (Equation 2.7.48) 
is not satisfied, so that convergence isn’t guaranteed. But starting at y = -n/4 is 
guaranteed to work: this gives 


M|/(y 0 )| 


( x v 


< 0.027 < 


1 

2 ' 
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Making some kind of global 
sense of such a patchwork of 
graphs of functions can be quite 
challenging indeed, especially in 
higher dimensions. It is a sub- 
ject full of open questions, some 
fully as interesting and demanding 
as, for example, Fermat’s last the- 
orem, whose solution after more 
than three centuries aroused such 
passionate interest. Of particular 
interest are four-dimensional man- 
ifolds (4-manifolds), in part be- 
cause of applications in represent- 
ing spacetime. 


This description is remarkably 
concise and remarkably uninfor- 
mative. It isn’t even clear how 
many dimensions X 2 and X3 have; 
this is typical when you know a set 
by equations. 



Figure 3.2.1. 

One possihle position of four 
linked rods, of lengths I 1 J 2 J 3 , 
and U, restricted to a plane. 


to determine whether every point of the subset has a neighborhood in which the 
subset is the graph of a function of several variables in terms of the others. If 
so, the set is a smooth manifold: manifolds are loci which are locally the graphs 
of functions expressing some of the standard coordinate functions in terms of 
others. Again, it is rare that a manifold is the graph of a single function. 


Example 3.2.1 (Linked rods). Linkages of rods are everywhere, in mechan- 
ics (consider a railway bridge or the Eiffel tower), in biology (the skeleton), in 
robotics, in chemistry. One of the simplest examples is formed of four rigid 
rods, with assigned lengths li,/ 2 ,l 3 ,l 4 > 0, connected by universal joints that 
can achieve any position, to form a quadrilateral, as shown in Figure 3.2.1. 

In order to guarantee that our sets are not empty, we will require that each 
rod be shorter than the sum of the other three. 

What is the set X 2 of positions the linkage can achieve if the points are 
restricted to a plane? Or the set X 3 of positions the linkage can achieve if 
the points are allowed to move in space? These sets are easy to describe by 
equations. For X 2 we have 

X 2 = the set (x!,x 2 ,x 3 ,x 4 ) € (M 2 ) 4 such that 3.2.1 

|xi-x 2 |=/,, |x 2 -x 3 | = / 2 , |x 3 -x 4 | = / 3 , \x 4 -xi\ = l 4 . 

Thus X 2 is a subset of S s . Another way of saying this is that X 2 is the subset 
defined by the equation f(x) = 0, where f : (R 2 ) 4 — ► R 4 is the mapping 



(*2 - xi ) 2 -I- (y 2 - j/i) 2 - l \ ‘ 
(x 3 - x 2 ) 2 + (3/3 - S/2) 2 - l\ 
(x 4 - x 3 ) 2 + (y 4 - y 3 ) 2 - l\ 
(xi -x 4 ) 2 + (y x - y 4 ) 2 


3.2.2 


Similarly, the set X3 of positions in space is also described by Equation 
3.2.1, if we take x* € JR 3 ; X 3 is a subset of R 12 . (Of course, to make equations 
corresponding to Equation 3.2.2 we would have to add a third entry to the 
Xi, and instead of writing (x 2 - xj) 2 + (j /2 - y {) 2 — l\ we would need to write 
(x 2 - Xi ) 2 + (y 2 - yi ) 2 + (z 2 - Z \) 2 - l 2 .) 

Can we express some of the x, as functions of the others? You should feel, 
on physical grounds, that if the linkage is sitting on the floor, you can move 
two opposite connectors any way you like, and that the linkage will follow in a 
unique way. This is not quite to say that x 2 and X4 are a function of xi and x 3 
(or that xj and x 3 are a function of x 2 and x 4 ). This isn’t true, as is suggested 
by Figure 3.2.2. 

In fact, usually knowing Xj and x 3 determines either no positions of the 
linkage (if the xi and x 3 are farther apart than li + l 2 or l 3 + 1 4 ) or exactly 
four (if a few other conditions are met; see Exercise 3.2.3). But x 2 and X4 are 
locally functions of xi,x 3 . It is true that for a given xj and x 3 , four positions 



You could experiment with this 
system of linked rods by cutting 
straws into four pieces of differ- 
ent lengths and stringing them to- 
gether. For a more complex sys- 
tem. try five pieces. 

If you object that you cannot 
visualize what this manifold looks 
like, you have our sympathy; nei- 
ther can we. Precisely for this rea- 
son, it gives a good idea of the kind 
of problem that comes up: you 
have a collection of equations 
defining some set but you have 
no idea what the set looks like. 
For example, as of this writing we 
don’t know precisely when X 2 is 
connected — that is, whether we 
can move continuously from any 
point in X 2 to any other point in 
Xi. (A manifold can be discon- 
nected, as we saw already in the 
case of smooth curves, in Exam- 
ple 3.1.6.) It would take a bit 
of thought to figure out for what 
lengths of bars Xj is, or isn’t, con- 
nected. 



Figure 3.2.3. 

If three vertices are aligned, the 
end- vert ices cannot move freely: 
for instance, they can’t moved in 
the directions of the arrows with- 
out stretching the rods. 
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are possible in all, but if you move xi and x 3 a small amount from a given 
position, only one position of x 2 and x 4 is near the old position of x 2 and x 4 . 
Locally, knowing xi and x 3 uniquely determines x 2 and x 4 . 



FIGURE 3.2.2. Two of the possible positions of a linkage with the same Xi and X 3 
are shown in solid and dotted lines. The other two are xi,X 2 ,X 3 ,X 4 and xi,X 2 ,X 3 ,X 4 . 

Even this isn’t always true: if any three are aligned, or if one rod is folded 
back against another, as shown in Figure 3.2.3, then the endpoints cannot be 
used as parameters (as the variables that determine the values of the other 
variables). For example, if xi,x 2 and x 3 are aligned, then you cannot move Xi 
and x 3 arbitrarily, as the rods cannot be stretched. But it is still true that the 
position is a locally a function of x 2 and x 4 . 

There are many other possibilities: for instance, we could choose x 2 and x 4 
as the variables that locally determine Xi and x 3 , again making X 2 locally a 
graph. Or we could use the coordinates of xi (two numbers), the polar angle 
of the first rod with the horizontal line passing through xi (one number), and 
the angle between the first and the second (one number): four numbers in all, 
the same number we get using the coordinates of Xi and x 3 . 7 We said above 
that usually knowing Xx and x 3 determines either no positions of the linkage or 
exactly four positions. Exercise 3.2.4 asks you to determine how many positions 
are possible using xi and the two angles above — again, except in a few cases. 
Exercise 3.2.5 asks you to describe X 2 and X 3 when L = h + {3 + h- A 


A manifold: locally the graph of a function 

The set X 2 of Example 3.2.1 is a four-dimensional manifold in K 8 ; locally, it is 
the graph of a function expressing four variables (two coordinates each for two 
points) in terms of four other variables (the coordinates of the other two points 

7 Suc.h a system is said to have four degrees of freedom. 
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or some other choice). It doesn't have to be the same function everywhere. In 
most neighborhoods, X 2 is the graph of a function of xj and X3, but we saw 
that this is not true when xi, X2 and X3 are aligned; near such points, X2 is the 
graph of a function expressing Xi and X3 in terms of X2 and X4. s 

Now r it’s time to define a manifold more precisely. 


Definition 3.2.2 is not friendly. 
Unfortunately, it is difficult to be 
precise about what it means to be 
“locally the graph of a function” 
without getting involved. But 
we have seen examples of just 
what this means in the case of 
1-manifolds (curves) and 2-mani- 
folds (surfaces), in Section 3.1. 

A A:- manifold in is locally 
the graph of a mapping expressing 
n—k variables in terms of the other 
k variables. 


Definition 3 . 2.2 (Manifold). A subset M C R n is a ^-dimensional man- 
ifold embedded in R n if it is locally the graph of a C l mapping expressing 
n — k variables as functions of the other k variables. More precisely, for every 
x 6 Af , we can find 

( 1 ) A: standard basis vectors e^, . . . ,e, fc corresponding to the k variables 
that, near x, will determine the values of the other variables. Denote 
by E\ the span of these, and by E2 the span of the remaining n-k 
standard basis vectors; let xj be the projection of x onto E \ , and X2 
its projection onto E2 ; 

( 2 ) a neighborhood U of x in IR n ; 

( 3 ) a neighborhood U\ of Xj in E\\ 

( 4 ) a mapping f : U\ E 2 ; 

such that M n U is the graph of f . 


If U c K n is open, the V is a 
manifold. This corresponds to the 
case where E\ = K n , E 2 = {0}. 


Figure 3.2.4 reinterprets Figure 
3.1.1 (illustrating a smooth curve) 
in the language of Definition 3.2.2. 



Figure 3.2.4. In the neighborhood of x, the curve is the graph of a function ex- 
pressing x in terms of y. The point xi is the projection of x onto E\ (i.e., the y-axis); 
the point X 2 is its projection onto E 2 (i.e., the x-axis). In the neighborhood of a, we 
can consider the curve the graph of a function expressing y in terms of x. For this 
point, E\ is the x-axis, and E 2 is the y-axis. 

For some lengths, X 2 is no longer a manifold in a neighborhood of some positions: 

!. , a j J f ° Ur engths ^ e 3 ua1 ’ then >s not a manifold near the position where it is 
tolded flat. 
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A curve in IK 2 is a 1-manifold 
in IR 2 ; a surface in IR 3 is a 2- 
manifold in IR 3 ; a curve in LR 3 is 
a 1-manifold in M 3 . 


Recall that for both curves in M 2 and surfaces in IR 3 , we had n-k = 1 variable 
expressed as a function of the other k variables. For curves in IR 3 , there are 
n - k — 2 variables expressed as a function of one variable; in Example 3.2.1 
we saw that for X 2 , we had four variables expressed as a function of four other 
variables: X 2 is a 4-manifold in IR 8 . 

Of course, once manifolds get a bit more complicated it is impossible to draw' 
them or even visualize them. So it’s not obvious how to use Definition 3.2.2 to 
see whether a set is a manifold. Fortunately, Theorem 3.2.3 will give us a more 
useful criterion. 


Since f : U — R n ~ k , saying 
that (Df(x)j is onto is the same 
as saying that it has n-k linearly 
independent columns, which is the 
same as saying that those n-k 
columns span IR n-fc : the equation 

[Df(x)]v = b 

has a solution for every b € IR"'"*. 
(This is the crucial hypothesis of 
the stripped-down version of the 
implicit function theorem, Theo- 
rem 2.9.9.) 


In the proof of Theorem 3.2.3 
we would prefer to write 


(g(u)) 


= 0 rather than 

f(u + g(u)) =0, 

but that’s not quite right because 
Ei may not be spanned by the 
first k basis vectors. We have 
u € Ei and g(u) € E 2 \ since both 
E\ and E 2 are subspaces of ]R n , 
it makes sense to add them, and 
u + g(u) is a point of the graph 
of u. This is a fiddly point; if you 

find it easier to think of f ^ , 

go ahead; just pretend that E x is 
spanned by e x ,...,e k , and E 2 by 

+ !*••• j Cn . 


Manifolds known by equations 

How do we know that our linkage spaces X 2 and X 3 of Example 3.2.1 are 
manifolds? Our argument used some sort of intuition about how the linkage 
would move if we moved various points on it, and although we could prove 
this using a bit of trigonometry, we want to see directly that it is a manifold 
from Equation 3.2.1. This is a matter of saying that f(x) = 0 expresses some 
variables implicitly as functions of others, and this is exactly what the implicit 
function theorem is for. 

Theorem 3.2.3 (Knowing a manifold by equations). Let U C R n be 
an open subset, an d f : U — ► K n-fc be a differentiable mapping with Lipschitz 
derivative (for instance a Cf 1 mapping). Let M C U be the set of solutions 
to the equation f(x) = 0. 

If (Df(x)] is onto, then M is a k-dimensional manifold embedded in 


This theorem is a generalization of part (a) of Theorems 3.1.9 (for curves) 
and 3.1.16 (for surfaces). Note that we cannot say — as we did for surfaces in 
Theorem 3.1.16— that M is a /e-manifold if [Df(x)] ^ 0. Here [Df(x)J is a 
matrix n — k high and n wide; it could be nonzero and still fail to be onto. 
Note also that k, the dimension of M, is n - (n - A:), i.e., the dimension of the 
domain of f minus the dimension of its range. 

Proof. This is very close to the statement of the implicit function theorem, 
Theorem 2.9.10. Choose n-k of the basis vectors e { such that the correspond- 
ing columns of [Df(x)j are linearly independent (corresponding to pivotal vari- 
ables). Denote by E 2 the subspace of R n spanned by these vectors, and by 
Ei the snbspace spanned by the remaining k standard basis vectors. Clearly 
dim E 2 = n — k and dim E\ = k. 

Let xj be the projection of x onto E u and x 2 be its projection onto E 2 . 
The implicit function theorem then says that there exists a ball £/, around 
Xy a ball U 2 around x 2 and a differentiable mapping g : U x -*> U 2 such that 
f (u -I- g(u)) = 0, so that the graph of g is a subset of M. Moreover, if U is 
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the set of points with E \ -coordinates in U\ and ^-coordinates in U 2 , then the 
implicit function theorem guarantees that the graph of g is M DU. This proves 
the theorem. □ 


Example 3.2.4 (Using Theorem 3.2.3 to check that the linkage space 
X 2 is a manifold). In Example 3.2.1, X 2 is given by the equation 


Each partial derivative at right 
is a vector with four entries: e.g., 

#i/i(x)' 

£> 1/2 (x) 

^ 1/3 (x) 
Dift{x) m 

and so on. 


D,f(x) = 


/X,\ 


Vi 



x 2 


"(x 2 - xi) 2 + ( 2/2 - yi) 2 - ^i ■ 

V2 


(x 3 - x 2 ) 2 + (y 3 - 2ft) 2 - l 2 

x 3 


(x 4 ” X 3 ) 2 + (y 4 - y 3 ) 2 - ll 

V 3 
x 4 

\V4 / 


. (xi - x 4 ) 2 + (yi - y 4 ) 2 - 1 4 . 


The derivative is composed of the eight partial derivatives (in the second line 
we label the partial derivatives explicitly by the names of the variables): 

[Df(x)] = (£>,f(x), Djf(x), D 3 f(x), D 4 f(x), £> 5 f(x), As f(x), D 7 f(x), Dgf(x)] 

= [ D x x f(x). D Vl f(x), D x J(x), D n t(x), D x , f(x), D y J(x), D Xi f (x), A„ 4 f(x)j. 


Unfortunately we had to put 
the matrix on two lines to make it 
fit. The second line contains the 
last four columns of the matrix. 


Computing the partial derivatives gives 
r 2(zi - x 2 ) 2(yi - y 2 ) 


[Df (x)] = 


0 

0 


L- 2 (z 4 - xi ) --2(2/4 -yi) 


-2(xi - x 2 ) 
2(z 2 - x 3 ) 
0 
0 


2(2/1 - 2/2) 

2(2/2 - Jft) 
0 
0 


3.2.4 



If the points xi,X2, and x 3 are 
aligned, then the first two columns 
of Equation 3.2.5 cannot be lin- 
early independent: 2/1 — i/2 is nec- 
essarily a multiple of x\ - x 2i and 
ya - y 3 is a multiple of x 2 - x 2 . 


0 0 0 0 ' 
-2(x 2 - x 3 ) — 2 ( 2/2 - y 3 ) 0 0 

(x 3 - x 4 ) 2(y 3 - y 4 ) -2(x 3 - x 4 ) -2 (y 3 - y 4 ) 

0 0 2(x 4 — X\) 2(y 4 -yi). 

Since f is a mapping from M 8 to K 4 , so that E 2 has dimension n-k = 4, four 
standard basis vectors can be used to span E 2 if the four corresponding column 
vectors are linearly independent. For instance, here you can never use the first 
four, or the last four, because in both cases there is a row of zeroes. How about 
the third, fourth, seventh, and eighth, i.e., the points x 2 = ( y 2 ) >*4 = (y 4 )? 
These work as long as the corresponding columns of the matrix 

'-2(xi - x 2 ) — 2(yi - y 2 ) 0 0 * 

2(x 2 - x 3 ) 2 (y 2 - y 3 ) 0 0 

0 0 — 2(x 3 - x 4 ) — 2(y 3 — y 4 ) 

L 0 0 2(x 4 -xi) 2(y 4 — y\). 

d *2 f (x) D V 2 f(x) D X4 f(x) D v 4 f(x) 


3.2.5 
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William Thurston, arguably 
the best geometer of the 20th cen- 
tury, says that the right way to 
know a A:-dimensional manifold 
embedded in n-dimensional space 
is neither by equations nor by 
parametrizations but from the in- 
side: imagine yourself inside the 
manifold, walking in the dark, 
aiming a flashlight first at one 
spot, then another. If you point 
the flashlight straight ahead, will 
you see anything? Will anything 
be reflected back? Or will you see 
the light to your side? . . . 


are linearly independent. The first two columns are linearly independent pre- 
cisely when xi,X 2, and X3 are not aligned as they are in Figure 3.2.5, and the 
last two are linearly independent when x : $, x 4 , and xi are not aligned. The same 
argument holds for the first, second, fifth, and sixth columns, corresponding to 
xj and X3. Thus you can use the positions of opposite points to locally param- 
etrize X2, as long as the other two points are aligned with neither of the two 
opposite points. The points are never all four in line, unless either one length 
is the sum of the other three, or l\ + I 2 = h + U> or h + fa = U + 
other cases, X 2 is a manifold, and even in these last two cases, it is a manifold 
except perhaps at the positions where all four rods are aligned. 


Equations versus parametrizations 

As in the case of curves and surfaces, there are two different ways of knowing 
a manifold: equations and parametrizations. Usually we start with a set of 
equations. Technically, such a set of equations gives us a complete description 
of the manifold. In practice (as we saw in Example 3.1.22 and Equation 3.2.2) 
such a description is not satisfying; the information is not in a form that can be 
understood as a global picture of the manifold. Ideally, ive also want to know 
the manifold by a global parametrization; indeed, we would like to be able to 
move freely between these two representations. This duality repeats a theme of 
linear algebra, as suggested by Figure 3.2.6. 



Algorithms 

Algebra 

Geometry 

Linear 

Algebra 

Row reduction 

Inverses of matrices 
Solving linear equations 

Subspaces 

Kernels and images 

Differential 

Calculus 

Newton’s method 

Inverse function theorem 
Implicit function theorem 

Manifolds 

Defining manifolds 

by equations and parametrizations 


FIGURE 3.2.6. Correspondences: algorithms, algebra, geometry 


Mappings that meet these cri- 
teria, and which parametrize the 
entire manifold, are rare. Choos- 
ing even a local parametrization 
that is well adapted to the prob- 
lem at hand is a difficult and im- 
portant skill, and exceedingly dif- 
ficult to teach. 


The definition of a parametrization of a manifold is simply a generalization 
of our definitions of a parametrization of a curve and of a surface: 

Definition 3.2.5 (Parametrization of a manifold). A parametrization 
of a fc-dimensional manifold M C M n is a mapping 7 : U — * M satisfying the 
following conditions: 

(1) U is an open subset of R n . 

(2) 7 is C 1 2 3 , one to one, and onto; 

(3) [D7 (u)] is one to one for every u € U . 
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The tangent space to a manifold 


In this sense, a manifold is a 
surface in space (possibly, higher- 
dimensional space) that looks flat 
if you look closely at a small re- 
gion. 

As mentioned ill Section 3.1. 
the tangent space will be essential 
in the discussion of constrained 
extrema, in Section 3.7. and in 
the discussion of orientation, in 
Section 6.5. 


The essence of a A-dimensional differentiable manifold is that it is well approx- 
imated. near every point , bv a A*- dimensional subspace of Everyone has an 
intuition of what this means: a curve is approximated by its tangent line at a 
point, a surface by its tangent plane. 

Just as in the cases of curves and surfaces, we want to distinguish the tangent 
vector space T X M to a manifold A/ at a point x € M from the tangent line, 
plane ... to the manifold at x. The tangent space T X M is the set of vectors 
tangent to M at x. 

Definition 3.2.6 (Tangent space of a manifold). Let M C K n be a 
A:-dimensional manifold and let x € A/, so that 

• k standard basis vectors span E \ ; 

• the remaining n — k standard basis vectors span E^\ 

• U\ C Ej, U C R n are open sets, and 

• g : Ui — ► E? is a C l mapping, 

such that x € U and M D U is the graph of g. Then the tangent vector 
space to the manifold at x, denoted T X M, is the graph of [Dg(x)]: the linear 
approximation to the graph is the graph of the linear approximation. 


Part (b) of Theorems 3.1.9 (for 
curves) and 3.1,16 (for surfaces) 
are special cases of Theorem 3.2.7. 


If we know a manifold by the equation f = 0, then the tangent space to the 
manifold is the kernel of the derivative of f. 

Theorem 3.2.7 (Tangent space to a manifold). If f = 0 describes a 
manifold, under the same conditions as in Theorem 3.2.3, then the tangent 
space T X M is the kernel of [D f(x)]. 


Proof. Let g be the function of which M is locally the graph, as discussed in 
the proof of Theorem 3.2.3. The implicit function theorem gives not only the 
existence of g but also its derivative (Equation 2.9.25): the matrix 


(Dg(x,)l = -(D„f(x) D u _ t f(x)]-‘(A,f(x), D,„f(x)] 3.2.6 


partial deriv. for 
pivotal variables 


partial deriv. for 
non-pivota] variables 


where D jl . . . D Jn k are the partial derivatives with respect to the n - k pivotal 
variables, and A, ...D ik are the partial derivatives with respect to the k non- 
pivotal variables. 

By definition, the tangent space to M at x is the graph of the derivative of 
g. Thus the tangetit space is the space of equation 


* = -[£/« f(x), fvjwr^fto, .... A*f(x))v, 


3.2.7 



One thing needs checking: if 
the same manifold can be repre- 
sented as a graph in two differ- 
ent ways, then the tangent spaces 
should be the same. This should 
be clear from Theorem 3.2.7. In- 
deed, if an equation f(x) expresses 
some variables in terms of others 
in several different ways, then in 
all cases, the tangent space is the 
kernel of the derivative of f and 
does not depend on the choice of 
pivotal variables. 


In Theorem 3.2.8. T -1 is not 
an inverse mapping; indeed, since 
T goes from R n to IR m , such an in- 
verse mapping does not exist when 
n ^ m. By T~ l (M) we denote 
the inverse image: the set of points 
x € K n such that T(x) is in M. 

A graph is automatically given 
by an equation. For instance, the 
graph of / : ]R — ► R is the curve of 
equation y - f(x) = 0. 


Corollary 3.2.9 follows immedi- 
ately from Theorem 3.2.8, as ap- 
plied to r -1 .- 

T(M) = (r-yiA/). 


274 Chapter 3. Higher Derivatives, Quadratic Forms, Manifolds 

where v is a variable in E\, and w is a variable in F 2 . This can be rewritten 

(£>j,f(x) ^„-*f(x)]w + [A,f(x), A.f(x)]v = 0, 3.2.8 

which is simply saying [Df(v -f w)) = 0. □ 

Manifolds are independent of coordinates 

We defined smooth curves, surfaces and higher-dimensional manifolds in terms 
of coordinate systems, but these objects are independent of coordinates; it 
doesn’t matter if you translate a curve in the plane, or rotate a surface in 
space. In fact Theorem 3.2.8 says a great deal more. 

Theorem 3.2.8. Let T : R n — + R m be a linear transformation which is 
onto. If M C is a smooth k-dimensional manifold, then T~ l (M) is a 
smooth manifold , of dimension k + n-m. 

Proof. Choose a point a 6 T~ l (M ), and set b = T( a). Using the notation of 
Definition 3.2.2, there exists a neighborhood U of b such that the subset MC\U 
is defined by the equation F(x) = 0, where F : U — ► F 2 is given by 

^(x*) =/( Xl )-X2 = °. 3.2.9 

Moreover, (DF(b)] is certainly onto, since the columns corresponding to the 
variables in F 2 make up the identity matrix. 

The set T~ 1 (MnU) — T~ l MnT~ l (U) is defined by the equation FoT(y) = 
0. Moreover, 


[DF o T(a)] = [DF(T(a))] 0 PT(a)] = [DF(b)j o T 3.2.10 

is also onto, since it is a composition of two mappings which are both onto. So 
T~ l M is a manifold by Theorem 3.2.3. 

For the dimension of the smooth manifold T~ l (M), we use Theorem 3.2.3 
to say that it is n (the dimension of the domain of F o T) minus m — k (the 
dimension of the range of F o T), i.e., n - m -f k. □ 

Corollary 3.2.9 (Manifolds are independent of coordinates). If 
T : R m — + ® m is an invertible linear transformation, and M C R m is a 
k-dimensional manifold , then T(M) is also a k-dimensional manifold. 


Corollary 3.2.9 says in particular that if you rotate a manifold the result is 
still a manifold, and our definition, which appeared to be tied to the coordinate 
system, is in fact coordinate- independent. 
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Almost the only functions that 
can be computed are polynomi- 
als, or rather piecewise polynomial 
functions, also known as splines: 
functions formed by stringing to- 
gether bits of different polynomi- 
als. Splines can be computed, 
since you can put if statements in 
the program that computes your 
function, allowing you to com- 
pute different polynomials for dif- 
ferent values of the variables. (Ap- 
proximation by rational functions, 
which involves division, is also im- 
portant in practical applications.) 


One proof, sketched in Exercise 
3.3.8, consists of using I’Hopital’s 
rule k times. The theorem is also 
a special case of Taylor’s theorem 
in several variables. 


3 Taylor Polynomials in Several Variables 

In Sections 3.1 and 3.2 we used first-degree approximations (derivatives) to 
discuss curves, surfaces and higher-dimensional manifolds. Now we will discuss 
higher-degree approximations, using Taylor polynomials. 

Approximation of functions by polynomials is a central issue in calculus 
in one and several variables. It is also of great importance in such fields as 
interpolation and curve fitting, computer graphics and computer aided design; 
when a computer graphs a function, most often it is approximating it with cubic 
piecewise polynomial functions. In Section 3.8 we will apply these notions to 
the geometry of curves and surfaces. (The geometry of manifolds is quite a bit 
harder.) 


Taylor’s theorem in one variable 

In one variable, you learned that at a point x near a, a function is well ap- 
proximated by its Taylor polynomial at a. Below, recall that / (n) denotes the 
nth derivative of /. 


Theorem 3.3.1 (Taylor’s theorem without remainder, one variable). 

If U C R is an open subset and f : U -+R is k times continuously differen- 
tiable on U, then the polynomial 

P/,q(g + ft) = /(«) + f'(a)h + ^ f"(a)h 2 + • • • + ^f {k) (a)h k 3.3.1 

Taylor polynomial 

is the best approximation to f at a in the sense that it is the unique poly- 
nomial of degree < k such that 


lim /(° + V- PfJ a + ^ 

h—*0 h k 


= 0. 


3.3.2 


We will see that there is a polynomial in n variables that in the same sense 
best approximates functions of n variables. 


Multi-exponent notation for polynomials in higher dimensions 

First we must introduce some notation. In one variable, it is easy to write the 
“general polynomial” of degree k as 


For example, 


k 

Oq + (L\X + CL2X 2 -I- 1- CLitX k = Q>iX l - 

i=0 


3.3.3 


3 + 2x - x 2 + 4x 4 = 3x° + 2x l - 1 x 2 + Ox 3 + Ax 4 


3.3.4 
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Polynomials in several vari- 
ables really are a lot more compli- 
cated than in one variable: even 
the first questions involving fac- 
toring, division, etc. lead rapidly 
to difficult problems in algebraic 
geometry. 


can be written as 

4 

^a,£\ where a 0 = 3, d\ = 2, a 2 = -1, a 3 = 0. a\ - 4. 3.3.5 

i ~ o 

But it isn’t obvious how to find a “general notation’* for expressions like 

1 + x + yz + x 2 + xyz -I- y 2 z - x 2 y 2 . 3.3.6 

One effective if cumbersome notation uses multi- exponents. A multi-expo- 
nent is a way of denoting one term of an expression like Equation 3.3.6. 

Definition 3.3.2 (Multi-exponent). A multi-exponent I is an ordered 
finite sequence of non-negative whole numbers, which definitely may include 
0 : 

I = (6) • • • fc'n)* 3.3.7 

Example 3.3.3 (Multi-exponents). In the following polynomial with n — 3 
variables: 

1 + x + yz -f x 2 + xyz + y 2 z - x 2 y 2 , (3.3.6) 

each multi-exponent I can be used to describe one term: 

1 = x°y°z° corresponds to I = (0, 0, 0) 
x = x l y°z° corresponds to / = (1,0,0) 3.3.8 

yz = x°y l z l corresponds to / = (0,1,1). A 

What multi-exponents describe the terms x 2 ,xyz,y 2 z, and x 2 y 2 ? 9 
The set of multi-exponents with n entries is denoted J„: 

Zn = {(»i,...i n )} 3.3.9 

The set I 3 includes the seven multi-exponents of Equation 3.3.8, but many 
others as well, for example I =• (0,1,0), which corresponds to the term y , 
and / = (2,2,2). which corresponds to the term x 2 y 2 z 2 . (In the case of the 
polynomial of Equation 3.3.6, these terms have coefficient 0.) 

We can group together elements of I n according to their degree: 

9 

* 2 = x 2 y i z° corresponds to / = (2,0,0). 
xyz = x 1 y l z 1 corresponds to I - (l, 1 . 1 ). 
y 2 z - x°y 2 z 1 corresponds to 7 = (0, 2, 1). 
x 2 y 2 = x 2 y 2 z° corresponds to I - (2,2.0). 
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For example, the set Z3 of 
multi-exponents with three entries 
and total degree 2 consists of 
( 0 , 1 , 1 ), ( 1 , 1 , 0 ), ( 1 , 0 , 1 ), ( 2 , 0 , 0 ), 
(0,2,0), and (0,0,2). 

Recall that 0! = 1, not 0. 

For example, if / - (2,0,3), 
then /! = 2!0!3! = 12. 


Definition 3.3.4 (Degree of a multi-exponent). For any multi-exponent 

I € Z n , the total degree of / is deg I = i\ -f + in- 

The degree of xyz is 3, since 1 f 1 + 1 = 3; the degree of y 2 z is also 3. 

Definition 3.3.5 (I!). Fbr any multi-exponent J € Zn, 

/! = ii! • . . . • i n t. 3.3.10 

Definition 3.3.6 (Z*). We denote by Z£ the set of multi-exponents with n 
entries and of toted degree k. 


5; 


The monomial x%xi is of degree 
it can be written 


What are the elements of the set Z|? Of Zf? Check your answers below. 10 

Using multi-exponents, we can break up a polynomial into a sum of mono- 
mials (as we already did in Equation 3.3.8). 


In Equation 3.3.12, m is just 
a placeholder indicating the de- 
gree. To write a polynomial with 
n variables, first we consider the 
single multi-exponent I of degree 
m = 0, and determine its coeffi- 
cient. Next we consider the set 
(multi-exponents of degree m = 1) 
and for each we determine its co- 
efficient. Then we consider the 
set (multi-exponents of degree 
m — 2), and so on. Note that we 
could use the multi-exponent no- 
tation without grouping by degree, 
expressing a polynomial as 

H a/x 7 . 
rei n 

But it is often useful to group 
together terms of a polynomial 
by degree: constant term, lin- 
ear terms, quadratic terms, cubic 
terms, etc. 


Definition 3.3.7 (Monomial). For any I € Tn, the function 

x 7 = x\ l . . . x^" on will be called a monomial of degree deg I. 


Here ii gives the power of £i, while %2 gives the power of £ 2 , and so on. If 
/ =5 (2, 3, 1), then x 7 is a monomial of degree 6: 


/ = v ( 2 , 3 ,l) _ 3.2^1 


X = X 


l x 2 X 3' 


3.3.11 


We can now write the general polynomial of degree A: as a sum of monomials, 
each with its own coefficient a/: 


k 


Y Y °' x '- 

m=0 leT™ 


3.3.12 


Example 3.3.8 (Multi-exponent notation). To apply this notation to the 
polynomial 

2 + xi — X 2 X 3 + 4xiX2£3 + 2xJx2* 3.3.13 


we break it up into the terms: 


2 = 2xJx2X^ 
x\ — lx}x°x 0 

-X 2 X 3 = -1xJx2 X 3 

4X1X2X3 =• 4 x[xJx 3 
2x\x\ = 2x\x\xl 


I = (0,0,0), degree 0, with coefficient 2 
/ = (1,0,0), degree 1, with coefficient 1 
I — (0, 1, 1), degree 2, with coefficient -1 
/ = (1, 1, 1), degree 3, with coefficient 4 
I = (2,2,0), degree 4, with coefficient 2. 


,0 ^ = {(1, 2), (2, 1), (0, 3), (3, 0); X% = {(1, 1, 1), (2, 1,0), (2,0, 1), (1,2,0), 
(1,0, 2), (0, 2, 1), (0, 1, 2), (3, 0,0), (0, 3, 0), (0, 0, 3). 
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Thus we can write the polynomial as 


We write / € TT under the sec- 
ond sum in Equation 3.3.14 be- 
cause the multi-exponents / that 
we are summing are sequences of 
three numbers, * 1 , 0:2 and Xh, and 
have total degree m. 


^ y a/x 7 , where 3.3.14 

fl(o.o.o) = 2, Q(i, o.o) = 1* a (o,i,i) = 3 3 15 

0 ( 1 . 1 . 1 ) = 4 > 0 ( 2 , 2 , 0 ) = 2 , 

and all other a/ = 0, for / G 2J 1 , with m < 4. (There are 30 such terms.) A 


What is the polynomial 


Exercise 3.3.6 provides more 


practice with multi-exponent no- 
tation. 

H aix '’ 

3.3.16 

m=0 I 



where a(o,o) = 3, fl(i,o) = ~1> a (i, 2 ) = 3, ct( 2 ,i) 

coefficients a/ are 0? Check your answer below. 11 

= 2, and all the other 


Multi-exponent notation and equality of crossed partial 
derivatives 


Recall that different notations 
for partial derivatives exist: 


DADifH a) 


dxjdx. 


(a). 


Multi-exponent notation also provides a concise way to describe the higher par- 
tial derivatives in Taylor polynomials in higher dimensions. Recall (Definition 
2.7.6) that if the function Dif is differentiable, then its partial derivative with 
respect to the jth variable, Dj(D t f ), exists 12 ; it is is called a second partial 
derivative of /. 

To apply multi-exponent notation to higher partial derivatives, let 

Djf = D\'D'j ...Dicf. 3.3.17 


For example, for a function / in three variables, 


Of course Dj f is only defined if 
all parti&ls up to order deg / exist, 
and it is also a good idea to as- 
sume that they are all continuous, 
so that the order in which the par- 
tials are calculated doesn’t matter 
(Theorem 3.3.9). 


£>i(l>,(D 2 (£> 2 /))) =D\(D\f) can be written D\ (D^Dj/)), 3.3.18 

which can be written £>( 2 , 2 , 0 )/. ie., £>//, where I - (i\,i 2 ,h) = (2,2,0). 

What is D( 1 , 0 , 2 )/, written in our standard notation for higher partial deriva- 
tives? What is £>( 0 , 1 , 1 )/? Check your answer below. 13 

u It is 3 - 11 -f 3xix 2 + 2*i * 2 - 

12 This assumes, of course, that / : U — ► R is a differentiable function, and U C JR n 
is open. 

13 The first is £>1 (£>§/), which can also be written D\ (03(03/)}, The second is 
D 2 (D 3 f). 
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Recall, however, that a multi-exponent I is an ordered finite sequence of 
non-negative whole numbers. Using multi-exponent notation, how can we dis- 
tinguish between D\(Dzf) and )? Both are written Z}(i.o t i). Similarly, 

Di i could denote D\(D 2 f) or Z) 2 (^i/)- Is this a problem? 

No. If you compute the second partials Di(Dsf) and D^(Dif) of the func- 
tion x 2 + xy 3 + xz , you will see that they are equal: 


Dx ( D z f ) 



D*(D x f) 



= 1. 


3.3.19 


We will see when we define 
Taylor polynomials in higher di- 
mensions (Definition 3.3.15) that 
a major benefit of multi-exponent 
notation is that it takes advantage 
of the equality of crossed partials, 
writing them only once; for in- 
stance, Di(D 2 f) and D 2 {Dif) are 
written 


Theorem 3.3.9 is a surprisingly 
difficult result, proved in Appen- 
dix A. 6. In Exercise 4.5.11 we give 
a very simple proof that uses Fu- 
bini’s theorem. 


Similarly, D\{D 2 f) = D 2 (Dif), and D 2 (D 3 f) = D 3 (D 2 f). 

Normally, crossed partials are equal. They can fail to be equal only if the 
second partials are not continuous; you are asked in Exercise 3.3.1 to verify that 
this is the case in Example 3.3.11. (Of course the second partials do not exist 
unless the first partials exist and are continuous, in fact, differentiable.) 

Theorem 3.3.9 (Crossed partials equal). Let f : U -+ IR be a function 
such that all second partial derivatives exist and are continuous. Then for 
every pair of variables Xi>Xj, the crossed partials are equal: 

Dj(Dif)( a) = D i (D j f)(a). 3.3.20 

Corollary 3.3.10. If f : U — ♦ 1R is a function all of whose partial derivatives 
up to order k are continuous , then the partial derivatives of order up to k do 
not depend on the order in which they are computed. 


For example, Di(D s (D k f))( a) = D k (Dj(Dif))( a), and so on. 

The requirement that the partial derivatives be continuous is essential, as 
shown by Example 3.3.11 


Don’t take this example too se- 
riously. The function / here is 
pathological; such things do not 
show up unless you go looking 
for them. You should think that 
crossed partials are equal. 


Example 3.3.11 (A case where crossed partials 
the function 


ai cn t 


equtuj, 


VUi lOlUCl 




X 2 + V 2 


0 


“ (:) * o 

C) 


3.3.21 


Then 


D,f( x ) = 4 *V + **v - y 5 
J \yJ (x 2 + y2)2 


and 



x 5 - 4x 3 y 2 - xy 4 

{x 2j r y 2 ) 2 

o o rtrt 



For example, take the polyno- 
mial x -t- 2x 2 + 3x'* (i.e.. <i\ — 
1 . a -2 = 2, n.i - 3. ) Then 

f'(i) — 1 4- Ax -f- 9x 3 . so 
/'( 0) — 1: indeed. l!«i = 1 
f"{x) = 4 -f 18x. so 
/"( 0) = 4: indeed. 2! a* - 4 

/ f3) (x) = 18; 

indeed, 3! a. , = 6 - 3 = 18. 

Evaluating the derivative's at 0 
gets rid of terms that coine from 
higher-degree; terms. For example, 
in /"(x) — 4 -• 18x, the 18x eomes 
from the original 3x\ 


In Proposition 3.3.12 we use J 
to denote the multi-exponents we 
sum over to express a polynomial, 
and I to denote a particular multi- 
exponent . 
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when ^ (o)- all<l both P artials vanish at the origil1 ' So 

D ' f (l) = { 0 if y = 0 = -y and ifx-o"*’ 

giving D 2 (D { f) (”) = D 2 (-y) = -1 and D\(D 2 f) (q) = D\(x) — 1, 

the first for any value of y and the second for any value of x; at the origin, the 
crossed partials D-2{D\f) and D\{D<2f) are not equal. A 


The coefficients of polynomials as derivatives 

We can express the coefficients of a polynomial in one variable in terms of 
the derivatives of the polynomial at 0. If p is a polynomial of degree k with 
coefficients ao • • • i-e*. 


p(x) = ao + d\X + CL 2 X 2 H + dkX k . 


3.3.23 


then, denoting by p h) the ith derivative of p, we have 


i!a, 



i.e 



3.3.24 


Evaluating the ith derivative of a polynomial at 0 isolates the coefficient of 
x ‘ : the ith derivative of lower terms vanishes, and the ith derivative of higher 
degree terms contains positive powers of x, and vanishes (is 0) when evaluated 
at 0. 

We will want to translate this to the case of several variables. You may 
wonder why. Our goal is to approximate differentiable functions by polynomials. 
We will sec in Proposition 3.3.19 that if, at a point a, all derivatives up to 
order A: of a function vanish, then the function is small in a neighborhood of 
that point (small in a sense that depends on k). If we can manufacture a 
polynomial with the same derivatives up to order k as the function we want to 
approximate, then the function representing the difference between the function 
being approximated and the polynomial approximating it will have vanishing 
derivatives up to order k: hence it will be small. 

So, how does Equation 3.3.23 translate to the case of several variables? 

As in one variable, the coefficients of a polynomial in several variables can 
expressed in terms of the partial derivatives of the polynomial at 0. 


Proposition 3.3.12 (Coefficients expressed in terms of partial 
derivatives at 0). Let p be the polynomial 

k 

p( x ) = ^2 aj * J • 3.3.25 

mstO J€l™ 

Then for any particular I el, we have /! a/ = D/p( 0). 
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For example, if /(*) — * 2 . then 
f"(x) = 2. 


Proof. First, note that it is sufficient to show that 

D/x'fO) = /! and D/x J (0) = 0 for all J # I. 
We can see that this is enough by writing: 


p written in 

multi-exponent form 

■*- 


D,p( 0) = D, £ ajx J ] (0) 


m=0 Jel™ 


3.3.26 


3.3.27 


If you find it hard lo focus 
on this proof written in multi- 
exponent notation, look at Exam- 
ple 3.3.13. 


k 

= 5 ^ 51 a ./ D / x,/ ( 0 )’ 

m=0 Jel™ 

if we prove the statements in Equation 3.3.26, then all the terms ajDjx J ( 0) 
for J ^ I drop out. leaving Djp{ 0) = /! aj. 

To prove that D/x 7 (0) = /!. write 


DfX 1 = D '}* 


= ij! 




7* ,n 
x n 


= o;-x;- 


■ D 1n r ,Tl 

. . i/ tl x n 


3.3.28 


= /!. 


To prove Z)/X ; (0) = 0 for all J / /, write similarly 


£>/x J 




Dj*^' 




3.3.29 


At least one j rn must be different from i m , either bigger or smaller. If it is 
smaller, then we see a higher derivative than the power, and the derivative is 0. 
If it is bigger, then there is a positive power of x m left over after the derivative, 
and evaluated at 0, we get 0 again. □ 


Multi-exponent notation takes 
some getting used to: Example 
3.3.13 translates multi-exponent 
notation into more standard (and 
less concise) notation. 


Example 3.3.13 (Coefficients of a polynomial in terms of its partial 
derivatives at 0). What is where p = 3xf *2? We have D 2 P = 9*2 *i • 

D 2 p — 18x2*1, all d 80 on > ending with D\D\D 2 D 2 D 2 p = 36. 

In multi-exponent notation, p = 3*1*2 * s written 3x* 2,3 \ i.e., a/x 7 , where 
/ = (2,3) and 0(2.3) = 3. The higher partial derivative D\D\p is written 
£>(2,3 )P- By definition (Equation 3.3.10), when / = (2,3). /! = 2!35 = 12. 

Proposition 3.3.12 says 

1 1 36 

0/ = liere ’ Jj^<2.3)P(0) = 22 ” 3 ’ ' S indeed °<2.3)- 

What if the multi-exponent / for the higher partial derivatives is not the same 
as the multi-exponent J for x? As mentioned in the proof of Proposition 3.3.12, 
the result is 0. For example, if we take D*D 2 of the polynomial p = 3*f*2, so 
that J = (2,2) and J = (2,3), we get 36*2", evaluated at p = 0, this becomes 
0. If / > J, tlie result is also 0; for example, what is Dip( 0) when / = (2,3), 
p = a./x 7 , aj = 3, and J — (2, 2)? 14 A 


,4 This corresponds to £>?£>2(3***2): already, £>2 (3*?**) = 0. 



Although the polynomial in 
Equation 3.3.30 is called the Tay- 
lor polynomial of / at a, it is eval- 
uated at a 4- h, and its value there 
depends on h, the increment to a. 

In Equation 3.3.30. remember 
that / is a multi-exponent; if you 
want to write the polynomial out 
in particular cases, it can get com- 
plicated, especially if A* or n is big. 


Example 3.3. 1(5 illustrates no- 
tation; it has no mathematical 
content. 

The first term -the term of de- 
gree m=0- -corresponds to the 0th 
derivative, i.e., the function / it- 
self. 

Remember (Definition 3.3.7) 
that x 1 = r.y . . . x'j ; similarly, 
h 1 = h\ l ...hj". For instance, if 
I = (1,1) we have 

h' = h‘ ll) = A,/,*; 
if / = (2, 0, 3) we have 

h' = h‘ 2 -"- 31 = hi»l 

Since the crossed partials of / 
are equal, 

£*(l,l)/(a)hj ^2 = 

\OxD 2 f (a )/ t ,/. 2 

The term 1//! in the formula for 
the Taylor polynomial gives ap- 
propriate weights to the various 
terms to take into account the ex- 
istence of crossed partials. 

This is the big advantage of 
multi-exponent notation, which is 
increasingly useful as n gets big: 
it takes advantage of the existence 
of crossed partials. 
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Taylor polynomials in higher dimensions 

Now we are ready to define Taylor polynomials in higher dimensions, and to 
see in what sense they can be used to approximate functions in n variables. 

Definition 3.3.14 ( C k function). A C k function on U C R n is a function 
that is A:-times continuously differentiable — i.e., all of its partial derivatives 
up to order k exist and are continuous on U. 

Definition 3.3.15 (Taylor polynomial in higher dimensions). Let 
U C K" be an open subset and / : U — ► M be a C k function. Then the 
polynomial of degree k , 

k 1 

^/,.(a + h) = £ £ 7 /(»)£'■ 3.3.30 

m=0 I el™ 

is called the Taylor polynomial of degree A: of / at a. 


Example 3.3.16 (Multi-exponent notation for a Taylor polynomial of 
a function in two variables). Suppose / is a function in two variables. The 
formula for the Taylor polynomial of degree 2 of / at a is then 


3.3.31 


P/.(a + h) = £ £ j y D,f( a)h' 

m-0 /€ iJp ’ 

i^(o.o)/(a)/i?A^ + Y^£>(i,o)/(a)ftjA^ + ^D ( o.i)/(a)hJ/i2 


0 ! 0 ! 

S. 


✓ ^ 


/(a) 


terms of degree 1: first derivatives 

+ + Tfp£>(i,i)/(a)/ii/i 2 + 0^fAo,2)/(a)hJ/i|, 


terms of degree 2: second derivatives 

which we can write more simply as 


p h ( « + h) = /(a) + D am f(a)h t + D (01) f( a )h 2 

1 1 3 3 32 

+ 2^0) /(»)>*? + £(i,i )/(a)/ii/i2 + -D(o i2 )/(a)/i2. A 

Remember that £>(i <0 )/ corresponds to the partial derivative with respect to 
the first variable, D,/, while D (0tl) f corresponds to the partial derivative with 
respect to the second variable, £> 2 /. Similarly, corresponds to £>i£> 2 / = 

DiDif, and D( 2 0) / corresponds to D\D\f. A 
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What are the terms of degree 2 (second derivatives) of the Taylor polynomial 
at a, of degree 2, of a function with three variables? 15 


Example 3.3.17 (Computing a Taylor polynomial). What is the Taylor 
polynomial of degree 2 of the function / = sin(x + y 2 ), at (o)? ^he ^ rst 

term, of degree 0, is / ( q ) = s * n ^ = 0. For the terms of degree 1 we have 
D (i,o) f(y) =cos (x + y 2 ) and « 2ycos(x + y 2 ), 3.3.33 

so D (1 , 0) / (jj) = 1 and £>(o,i)/ j =0. For the terms of degree 2, we have 
d (2,o) f(y) = - sin {x + y 2 ) 

(y) = -2ysin(x + y 2 ) 3.3.34 

^( 0 , 2 )/ ( y ) = 2 cos(x + y 2 ) - 4 y 2 sin(x + y 2 ); 


In Example 3.4.5 we will see 
how to reduce this computation to 
two lines, using rules we will give 
for computing Taylor polynomials. 


evaluated at ( q ) , these give 0, 0, and 2 respectively. So the Taylor polynomial 
of degree 2 is 




— O-j-hj -f- 0 + 0 -P 0 — h-2- 


3.3.35 


What would we have to add to make this the Taylor polynomial of degree 3 
of / at f q)? The third partial derivatives are 


£>(3,0)/ (^) = D\D\f = D\ (— sin(x -f y 2 )) = -cos(x + y 2 ) 

D (0,3)f (y ) = (y ) = ^(2cos(x + y 2 ) - 4y 2 sin(x + y 2 )) 

= 4ysin(x + y 2 ) - 8ysin(x + y 2 ) - 8y 3 cos(x + y 2 ) 


15 The third term of 


Phi* + *>)=£ E 7f 0//(a)iT' 
m=o/ery» 


is 


D1D2 


D] D 3 


DjDj 


^(i.i,o)/(a)hi/i 2 + i?(i,o,i)/(a)hi/i3 -f- />(o,i,i)/(a)h 2 /i 3 
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Taylor’s theorem with remain- 
der is discussed in Appendix A. 9. 


Note that since we are dividing 
by a high power of |h|, the limit 
being 0 means that the numerator 
is very small. 


We must require that the par- 
tial derivatives be continuous; 
if the aren’t, the statement isn’t 
true even when k = 1, as you 
will see if you go back to Equa- 
tion 1.9.9, where / is the function 
of Example 1.9.3, a function whose 
partied derivatives are not contin- 
uous. 


D (2 .„/(*) = £>i(£>. D 2 ){ (* ) = £>,(-23,sin(x + sf)) = -2»cos(* + y 2 ) 

D (l 2>/ (l) = Di D lf (y) = Oi(2cos(j + y 2 ) - 4y 2 sin (x + y 2 )) 

= -2sin(x + y 2 ) - 4y 2 cos(a: + y 2 ). 3.3.36 

At (jj) all are 0 except £>( 3) o), which is -1. So the term of degree 3 is 
(—5!)^? = -gh?» and the Taylor polynomial of degree 3 of / at (q) is 

:) 

Taylor's theorem without remainder in higher dimensions 



®)- 


hi + h 2 — ~hj . 


A 3.3.37 


Theorem 3.3.18 (Taylor’s theorem without remainder in higher di- 
mensions). (a) The polynomial Pf n ( a 4- h) is the unique polynomial of 
total degree k which has the same partial derivatives up to order k at a as 

/• 

(b) The polynomial Pj >m ( a + h) is the unique polynomial of degree < k 

that best approximates f when h — ♦ 0, in the sense that it is the unique 
polynomial of degree < k such that 


, /(a + h) - P*> + h) 

lim sr-* 2 

h-o |h| fc 


3.3.38 


To prove Theorem 3.3.18 we need the following proposition, which says that 
if all the partial derivatives of / up to some order k equal 0 at a point a, then 
the function is small in a neighborhood of a. 

Proposition 3.3.19 (Size of a function with many vanishing partial 
derivatives). Let U be an open subset of R n and f : U — *■ JR be a C k 
function. If at a € U all partial derivatives up to order k vanish (including 
the 0th partial derivative, i.e., /(a)), then 

lim - 1 = 0. 3.3.39 

h — 0 jh| fc 
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The. expression in Equation 
3.3.41 is Dip{ 0), where p = <?/,*. 

We get the equality of Equation 
3.3.41 by the same argument as 
in the proof of Proposition 3.3.12: 
all partial derivatives where I ^ J 
vanish. 


Proof of Theorem 3.3.18. Part (a) follows^ from Proposition 3.3.12. Con- 
sider the polynomial Q) a that, evaluated at h, gives the same result as the 

Taylor polynomial P* a evaluated at a + h: 

P* a (a + h) = QiU(h) =E E JT 3.3.40 

m=0JeI™ 

Now consider the /th derivative of that polynomial, at 0: 


(0) = iDr/Wh'tO). 3.3.41 


Proposition 3.3.12 says that for a polynomial p, we have J!a/ = Dip( 0), 
where the a/ are the coefficients. This gives 


p=Q/.. 

/ — l * r 

\m—0 N - '— v — * / 

coefficient of p 

> — — — 1 " — - 

Dip( 0) 


7th coeff. of 

/! ± (D,/(a)) = D,Q k f J0); i.e., D,f( a) = £),<#,.( 0). 
7 ! 


"/!«7 


Dip(0) 


3.3.42 


Now, when h = 0, then P£ a ( a + h) becomes P* a ( a), so 

DjQ), a (0) = tf/P/Va), so DjPfja) = D//(a); 3.3.43 

the partial derivatives of P* a , up to order k y are the same as the partial deriva- 
tives of /, up to order k. Therefore all the partials of order at most k of the 
difference /(a + h) - P£ a (a + h) vanish. 

Part (b) then follows from Proposition 3.3.19. To lighten the notation, denote 
by g { a 4- h) the difference between /(a 4- h) and the Taylor polynomial of / at 
a. Since all the partials of g up to order k vanish, Proposition 3.3.19 says that 


lim g(a , + - ^ = 0. □ 3.3.44 

fi~o |h|* 


3.4 Rules for Computing Taylor Polynomials 

Computing Taylor polynomials is very much like computing derivatives; in 
fact, when the degree is 1 , they are essentially the same. Just as we have rules for 
differentiating sums, products, compositions, etc., there are rules for comput- 
ing Taylor polynomials of functions obtained by combining simpler functions. 
Since computing partial derivatives rapidly becomes unpleasant, we strongly 
recommend making use of these rules. 



“Since the computation of suc- 
cessive derivatives is always pain- 
ful , we recommend (when it is pos- 
sible) considering the function as 
being obtained from simpler func- 
tions by elementary operations 
(sum, product, power, etc). ... 
Taylor polynomials are most of- 
ten a theoretical, not a practi- 
cal, tool." Jean Dieudonne, Cal- 
cul Infinitesimal 


A famous example of an asymp- 
totic development is the prime 
number theorem , which states that 
if 7r(x) represents the number of 
prime numbers smaller than x, 
then, for x near oc, 



(Here 7r has nothing to do with 
7r « 3.1415.) This was proved in- 
dependently in 1898 by Hadamard 
and de la Valle-Poussin, after be- 
ing conjectured a century earlier 
by Gauss. 

Anyone who proves the strong- 
er statement., 

n{x)= r ii du+o (w* T ')' 

for all c > 0 will have proved 
the Riemann hypothesis , one of 
the two most famous outstand- 
ing problems of mathematics, the 
other being the Poincare conjec- 
ture. 
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To write down the Taylor polynomials of some standard functions, we will 
use notation invented by Landau to express the idea that one is computing “up 
to terms of degree A:”: the notation o, or “little o.” While in the equations of 
Proposition 3.4.2 the “little o” term may look like a remainder, such terms do 
not give a precise, computable remainder. Little o provides a way to bound 
one function by another function, in an unspecified neighborhood of the point 
at which you are computing the Taylor polynomial. 


Definition 3.4.1 (Little o). Little o, denoted o, means “smaller than,” in 
the following sense: if h(x) > 0 in some neighborhood of 0, then / € o(h) if 
for all € > 0, there exists <5 > 0 such that if |x| < d, then 

|/(x)| < eh(x). 3.4.1 


Alternatively, we can say that / € o (h) if 


Hm = 0; 

x—*o h(x ) 


3.4.2 


in some unspecified neighborhood, |/| is smaller than h\ as x — * 0, |/(x)| 
becomes infinitely smaller than h(x). 

Very often Taylor polynomials written in terms of bounds with little o are 
good enough. But in settings where you want to know the error for some 
particular x, something stronger is required: Taylor’s theorem with remainder, 
discussed in Appendix A.9. 


Remark. In the setting of functions that can be approximated by Taylor 
polynomials, the only functions h(x) of interest are the functions |x|* for k > 0. 
In other settings, it is interesting to compare nastier functions (not of class C k ) 
to a broader class of functions, for instance, one might be interested in bounding 
functions by functions h(x) such as y/\x\ or |x| log |x| . . . . (An example of what 
we mean by nastier functions” is Equation 5.3.10.) The art of making such 
comparisons is called the theory of asymptotic developments. But any place 
that a function is C k it has to look like an positive integer power of x. A 

In Proposition 3.4.2 we list the functions whose Taylor polynomials we expect 
you to know from first year calculus. We will write them only near 0, but 
by translation they can be written anywhere. Note that in the equations of 
Proposition 3.4.2, the Taylor polynomial is the expression on the right-hand 
side excluding the little o term, which indicates how good an approximation 

the Taylor polynomial is to the corresponding function, without giving any 
precision. 
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Equation 3.4.7 is the binomial 
formula. 


Propositions 3.4.3 and 3.4.4 are 
stated for scalar-valued functions, 
largely because we only defined 
Taylor polynomials for scalar- 
valued functions. However, they 
are true for vector-valued func- 
tions, at least whenever the lat- 
ter make sense. For instance, the 
product should be replaced by a 
dot product (or the product of a 
scalar with a vector-valued func- 
tion). When composing functions, 
of course we can consider only 
compositions where the range of 
one function is the domain of the 
other. The proofs of all these vari- 
ants are practically identical to the 
proofs given here. 


Proposition 3.4.2 (Taylor polynomials of some standard functions). 
The following formulas give the Taylor polynomials of the corresponding 
functions: 


e x = 


sin(a;) 

cos(x) 
log(l + x) 


l+x+7jr + *- - + — r + °( a;n ) 

Zl n\ 

-r 3 t 5 x 2n + l o ,, 

~ _ 5L + 1 + f-i) n -- + o(x 2n+1 ) 

x 3! 5! 1 1 (2n + 1)! V ’ 




~2 r n 

X . . , ,. n+1 x_ + q ( i »+ 1 ) 


x — — H 1- (-1) 


n 


(1 + x)” = 1 + m* + + m(m ~ g (m ^ x° + 

+ m(m - 1 ) -( m -( n ; 1 I) x n + 0 (l n } 

n\ 


3.4.3 

3.4.4 

3.4.5 

3.4.6 


3.4.7 


The proof is left as Exercise 3.4.1. Note that the Taylor polynomial for sine 
contains only odd terms, with alternating signs, while the Taylor polynomial for 
cosine contains only even terms, again with alternating signs. All odd functions 
(functions / such that f(—x) = - f{x)) have Taylor polynomials with only odd 
terms, and all even functions (functions / such that f{-x) = f(x)) have Taylor 
polynomials with only even terms. Note also that in the Taylor polynomial of 
log(l + x), there are no factorials in the denominators. 

Now let us see how to combine these Taylor polynomials. 

Proposition 3.4.3 (Sums and products of Taylor polynomials). Let 
U C R n be open, and f,g : U — ► R be C k functions. Then f + g and fg are 
also of class C k , and their Taylor polynomials are computed as follows. 

(a) The Taylor polynomial of the sum is the sum of the Taylor polynomials: 

^/+ s ,a( a + fi) = -P/,a( a + fi) + -P£a( a + ft* 3.4.8 

(b) The Taylor polynomial of the product fg is obtained by taking the 
product 

rfj* + h) • J*,(a + h) 
and discarding the terms of degree > k. 


3.4.9 
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Please notice that the composi- 
tion of two polynomials is a poly- 
nomial. 

Why does the composition in 
Proposition 3. -1.1 make sense? 
i’h,(b - «) is a good approxima- 
tion to f(h t a) only when ]«( is 
small. But our requirement that 
<y (a) — h guarantees precisely that 

/£ a (a — h) ~ h 4 something small 

when h is small. So it is rea- 
sonable to substitute that “some- 
thing small" for the increment u 
when evaluating the polynomial 

/*/.*»(* - 


Proposition 3.4.4 (Chain rule for Taylor polynomials). LctU cR n 
and V C SS be open, and g : U -> V and f : V —> be of class C k . Then 
f og : U -* 3. is of class C k , and if g( a) = 6, then the Tayior polynomial 
Pf°q a ( a + h) Js obtained by considering the polynomial 

P/Jp^ + h)) 

and discarding the terms of degree > k. 


Example 3.4.5 (Computing a Taylor polynomial: an easy example). 

Let’s use these rules to compute the Taylor polynomial of degree 3 of the 
function / (^) = sin(j; 4- y 2 ), at which we already saw in Example 

3.3.17. According to Proposition 3.4.4, we simply substitute x + y 2 for u in 
sin i/ = ?/. — ?/.*/( i 4- o(?r { ), and ornit. all the terms of degree > 3: 

sin(.r 4- y 2 ) - {x + y 2 ) - ^ + ° + y 2 ) A/2 ^ 

= x + y 2 - ~ 4- o ((x 2 4- y 2 ) :V2 ) - 3,4,10 

> v ' s ' 

Taylor polynomial error term 


Presto: half a page becomes two lines. 


Whenever von are trying to 
compute the Taylor polynomial of 
a quotient, a good tactic is to fac- 
tor out the constant terms (here. 
/(«■) 4- f(b)), and apply Equation 
3.4.7 to what remains. 


Example 3.4.6 (Computing a Taylor polynomial: a harder example). 

Let U C IF. be open, and / : U — » ?. be of class C 2 . Let V C U x U be the 
subset of ? 2 where f(x) 4- f(y) ^ 0. Compute the Taylor polynomial of degree 
2 of the function F : V - * W, at a point € V. 

F (y) f(x) 4- f(y) 3 ' 4 ‘ U 

Choose (g) € V, and set + JJ ). Then 



a 4* a 
b 4 - v 



1 

(/(<*) + IWu + /"(«)u 2 /2 + o(u 2 )) + (7(6) + f'(b)v + /"(6)t> 2 /2 + o(d 2 )) 


a constant 

A. 


(1 +X) 


- 1 


■s 


, where i is the fraction in the denominator 


l 


/(<») + m 


J /'(a)u + /"(a)t< 2 /2 + }'(b)v + f"(b)v 7 /2 

/(«) + m 


|+ o(u 2 + i/ 2 ). 


3.4.12 
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The fact that 

(1 + r.y 1 = 1 - j* + x 2 - . . . 

is a special case of Equation 3.4.7. 
where m = — 1 . We already saw 
this case in Example 0.4.9. where 
we had 


X 



n— 0 


The point of this is that the second factor is something of the form ( 1 +ar) 1 = 
1 - x -I- x 2 - . . • f leading to 


( a + u\ 
\b + r ) 


m + m 


. f'(a)u + f"(a)u 2 /2 + f'(b)v + f"(b) v 2 /2 
1 77 “ ■ 7777 3.4.13 


/(a) + /(6) 


+ 


'/'(«)« + /"(«)« 2 /2 + /'(&)« + / 
!(«) + /((>) 


"(fr)v 2 /2 \ 


T . . . 


In this expression, we should discard the terms of degree > 2, to find 


P 2 

F. 


( 




1 /'(a)u + f'(b)v /"(a)ii 2 + / i; (6)t i2 + (/'(a)** + f(b)v) 

Ha) + m ~ (/(a) + /(6)) 2 + 2(/(a)-f/(6)) 2 (/(a) + /(6)) 3 

3.4.14 


Taylor polynomials of implicit functions 


Among the functions whose Taylor polynomials we are particularly interested 
in are those furnished by the inverse and implicit function theorems. Although 
these functions are only known via some limit process like Newton’s method, 
their Taylor polynomials can be computed algebraically. 

Assume we are in the setting of the implicit function theorem (Theorem 
2.9.10), where we have an implicit function g such that 


F 


(*?’) 


= 0 


for all y in some neighborhood of b. 


It follows from Theorem 3.4.7 
that if you write the Taylor poly- 
nomial of the implicit function 
with undetermined coefficients, in- 
sert it into the equation specifying 
the implicit function, and identify 
like terms, you will be able to de- 
termine the coefficients. 


Theorem 3.4.7 (Taylor polynomials of implicit functions). If F is of 
class C k for some k > 1, then g is also of class C k , an d its Taylor polynomial 
of degree k is the unique polynomial mapping p : IR n — ► IR m of degree at 
most k such that 



3.4.15 


Example 3.4.8 (Taylor polynomial of an implicit function. The equa- 


tion F j y J = x 2 -f y 3 + xyz 3 — 3 = 0 determines z as a function of x and y in 

z 


1 


1 


a neighborhood of 1 , since D 3 F 1 I =3^0. Let compute the Taylor 



290 Chapter 3. Higher Derivatives, Quadratic Forms, Manifolds 


polynomial of this implicit function g to degree 2. . We will set 

g ( x \ = = 1+ (i\U + a 2 v + -JpU 2 + <i\,2 uv + ~W~ 1 ' 2 + °( u2 + v )• 

*\yj *\l + vj 2 ^ 3 . 4.16 

Inserting this expression for z into x 2 + y 3 + xyz 3 — 3 = 0 leads to 

(l+ u ) 2 +(l+ v ) 3 + (l+u)(l+t.t) ^1 + aiu + a 2 v -f ^-u 2 + ai,2 uv + € o (u 2 +v 2 ). 


The linear terms could have 
been derived from Equation 
2.9.25, which gives in this case 

|Dg(({))] = -[3]- , [3,4j 

= -|1/3J[3,4] 

= 1-1, -4/3], 


Now it is a matter of multiplying out and identifying like terms. We get: 

Constant terms : 3 — 3 = 0. 

Linear terms : 

4 

2u 4- + u + t* + ‘ia\u + 3a 2 t; = 0, i.e., aj = — 1, a 2 — — — • 


Quadratic terms : 

3 3 

u 2 (l+3ai+3a5+-ai ( i)+v 2 (3+3o2+3a2 + rt a 2,2)+^v(l+3ai+3a2+6aia2+3ai > 2). 

z z 


Identifying the coefficients to 0, and using aj = -1 and a 2 = —4/3 now gives 
aj i = —2/3, 02,2 — — 26/9, a\ t2 — 10/9. 3.4.17 

Finally, this gives the Taylor polynomial of g: 


g(y) = l-(x-l)-^(y-l)-i(x-l) 2 -y(y-l) 2 +y(x-l)(y-l)+o((x- l) 2 + (y- l) 2 ). 


3.4.18 


3.5 Quadratic Forms 


Exercises 3.5.1 and 3.5.2 give a 
more intrinsic definition of a qua- 
dratic form on an abstract vector 
space. 


A quadratic form is a polynomial all of whose terms are of degree 2. For 
instance, x 2 +y 2 and xy are quadratic forms in two variables, as is 4x 2 +xy— y 2 . 
The polynomial xz is also a quadratic form (probably in three variables). But 
xyz is not a quadratic form; it is a cubic form in three variables. 

Definition 3.5.1 (Quadratic form). A quadratic form Q : M n — ♦ M is a 
polynomial in the variables xj, . . . all of whose terms are of degree 2. 


Although we will spend much of this section working on quadratic forms that 
look like x 2 -\-y 2 or 4 x 2 +xy-y 2 , the following is a more realistic example. Most 
often, the quadratic forms one encounters in practice are integrals of functions, 
often functions in higher dimensions. 


Example 3.5.2 (An integral as a quadratic form). The integral 

Q(p) = f (p(t)) 2 dt, 3.5.1 

Jo 



The quadratic form of Exam- 
ple 3.5.2 is absolutely fundamental 
in physics. The energy of an elec- 
tromagnetic field is the integral of 
the square of the field, so if p is 
the electromagnetic field, the qua- 
dratic form Q(p) gives the amount 
of energy between 0 and 1. 


A famous theorem due to Fer- 
mat ( Fermat's little theorem) as- 
serts that a prime number p ^ 2 
can be written as a sum of two 
squares if and only if the remain- 
der after dividing p by 4 is 1 . The 
proof of this and a world of analo- 
gous results (due to Fermat, Eu- 
ler, Lagrange, Legendre, Gauss, 
Dirichlet, Kronecker, ... ) led to 
algebraic number theory and the 
development of abstract algebra. 


In contrast, no one knows any- 
thing about cubic forms. This has 
ramifications for the understand- 
ing of manifolds. The abstract, al- 
gebraic view of a four-dimensional 
manifold is that it is a quadratic 
form over the integers; because 
integral quadratic forms are so 
well understood, a great deal of 
progress has been made in under- 
standing 4-manifolds. But even 
the foremost researchers don’t 
know how to approach six-dimen- 
sional manifolds; that would re- 
quire knowing something about 
cubic forms. 
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where p is the polynomial p(t) — clq + a\t + a 2 t 2 , is a quadratic form, as we can 
confirm by computing the integral: 

Q(p) = f {a 0 + cut + a 2 t 2 ) 2 dt 

Jo 

= f ( q,q 4- a 2 t 2 4- Q%t 4 4- 2aod\t 4- 2aQa 2 t 2 + 2ai<i2t 3 ) dt 


= [a?t] * 4- 


[a 2 * 3 ] 

i 

\4t s 1 

l 

i 

2aoai t 2 

l 

> 

2aoa 2 t 3 

l 

_L 

’2aia2f 4 

3 

+ 

0 

5 

+ 

0 

2 

*r 

0 

3 

T 

0 

4 


l 


o a? a$ 2ao<i2 , d\0>2 

" a ° + j + T +aoai + ~r + ~r 


o 

3.5.2 


Above, p is a quadratic polynomial, but Q(p) is a quadratic form if p is a 
polynomial of any degree, not just quadratic. This is obvious if p is linear: if 
a 2 = 0, Equation 3.5.2 becomes Q(p) = a% 4- a 2 / 3 4- aoai- Exercise 3.5.3 asks 
you to show that Q is a quadratic form if p is a cubic polynomial. A 

In various guises, quadratic forms have been an important part of mathe- 
matics since the ancient Greeks. The quadratic formula, always the centerpiece 
of high school math, is one aspect. 

A much deeper problem is the question: what whole numbers a can be 
written in the form x 2 +y 2 ? Of course any number a can be written y/a 4-0 2 , but 
suppose you impose that x and y be whole numbers. For instance, 2 2 + l 2 = 5, 
so that 5 can be written as a sum of two squares, but 3 and 7 cannot. 

The classification of quadratic forms over the integers is thus a deep and dif- 
ficult problem, though now reasonably well understood. But the classification 
over the reals, where we are allowed to extract square roots of positive numbers, 
is relatively easy. We will be discussing quadratic forms over the reals. In par- 
ticular, we will be interested in classifying such quadratic forms by associating 
to each quadratic form two integers, together called its signature. 

In Section 3.6 we will see that quadratic forms can be used to analyze the 
behavior of a function at a critical point: the signature of a quadratic form will 
enable us to determine whether the critical point is a maximum, a minimum or 
some flavor of saddle y where the function goes up in some directions and down 
in others, as in a mountain pass. 


Quadratic forms as sums of squares 

Essentially everything there is to say about real quadratic forms is summed up 
by Theorem 3.5.3, which says that a quadratic form can be represented as a 
sum of squares of linearly independent linear functions of the variables. 
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YVe know that m < n. since 
there can't he more than n lin- 
early independent linear functions 
on (Exercise 2.6.3). 

The term ‘sum of squares" is 
traditional; it would perhaps he 
more accurate to call it a com- 
bination of squares, since some 
squares may he subtracted rather 
than added. 


Of course more than one qua- 
dratic form can have the same sig- 
nature. The quadratic forms in 
Examples 3.5.6 and 3.5.7 hclow 
both have signature (2. I). 


The key point is that ax 2 4- Bx 
can be rewritten 



(We have written a lower case and 
B upper case hecause in our appli- 
cations, a will be a number, but B 
will he a linear function. )which 


Theorem 3.5.3 (Quadratic forms as sums of squares). (a) For any 
quadratic form Q{x) on M n , there exist m linearly independent linear func- 
tions ori(x), . . . , or m (x) such that 

Q(x) = (ori(x)) 2 + h (orjt(x)) 2 - (»fc+i(x)) 2 (or* + /(x)) 2 . 3.5.3 

(h) The number k of plus signs and the number l of minus signs in a 
decomposition like that of Equation 3.5.3 depends only on Q and not on the 
specific linear functions chosen. 


Definition 3.5.4 (Signature). The signature of a quadratic form is the 
pair of integers (kj). 


The word suggests, correctly, that the signature remains unchanged regard- 
less of how the quadratic form is decomposed into a sum of linearly independent 
linear functions: it suggests, incorrectly, that the signature identifies a quadratic 
form. 

Before giving a proof, or even a precise definition of the terms involved, we 
want to give some examples of the main technique used in the proof; a careful 
look at these examples should make the proof almost redundant. 


Completing squares to prove the quadratic formula 


The proof is provided by an algorithm for finding the linearly independent 
functions o t : “completing squares.” This technique is used in high school to 
prove the quadratic formula. 

Indeed, to solve ax 2 -»- 6-r -I- c = 0. write 


ax 2 + bx -l- c = ax 2 + bx + 



+ c = 0, 


3.5.4 


gives 


( 


sfax + 


2 \fa 


\ 2 = 

) 4 o 


3.5.5 


Taking square roots gives 


s/ax + 


b 2 - 4oc 


2-Ja 


4 o 


, leading to the famous formula 3.5.6 


—b ± >/6 2 — 4ac 
x — 


2a 


3.5.7 
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Clearly the functions 



are linearly independent: no mul- 
tiple of yj 2 can give x 4- y/2. If 
we like, we can be systematic and 
write these functions as rows of a 
matrix: 

1 1/2 
.0 1 / 2 . • 

It is not necessary to row reduce 
this matrix to see that the rows 
are linearly independent. 


This decomposition of Q(x) is 
not the only possible one. For ex- 
ample, Exercise 3.5.7 asks you to 
derive two alternative decomposi- 
tions. 


Example 3.5.5 (Quadratic form as a sum of squares). 

x 2 + xy = x 2 + xy+ ^y 2 - ^y 2 = (x + “ (|) * 3,5,8 

In this case, the linear functions are 

ai (y) = x+ 2 and ai (y) = 2' 359 

Express the quadratic form x 2 + xy - y 2 as a sum of squares, checking your 
answer below. 16 


Example 3.5.6 (Completing squares: a more complicated example). 
Consider the quadratic form 

Q(x) — x 2 4- 2xy - 4xz + 2yz - 4z 2 . 3.5.10 

We take all the terms in which x appears, which gives us x 2 4- (2 y — 4z)x; we see 
that B = y - 2z will allow us to complete the square; adding and subtracting 
(y - 2 z) 2 yields 

Q(x) = (x + y- 2 z) 2 - (y 2 - Ayz 4- 4 z 2 ) 4- 2yz - 4 z 2 

O O o 3.5.11 

= (x 4- y - 2 z) 2 - y 2 4- 6 yz - 8 z 2 . 

Collecting all remaining terms in which y appears and completing the square 
gives: 


<9(x) = (i4y- 2 z) 2 -(y- 3 z) 2 4- ( z ) 2 . 3.5.12 

In this case, the linear functions are 





= x 4- y - 2z, a 2 



= V ~ 3z, 


and 



3.5.13 


If we write each function as the row of a matrix and row reduce: 


'1 

1 

—2 


*1 

0 

O' 

0 

1 

-3 

row reduces to 

0 

1 

0 

0 

0 

1 

m 


0 

0 

1 


we see that the functions are linearly independent. A 

The algorithm for completing squares should be pretty clear: as long as the 
square of some coordinate function actually figures in the expression, every 


x 2 + xy - y 2 = x 2 + xy 4- - — - y 2 

4 4 



2 


16 
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appearance of that variable can be incorporated into a perfect square; by sub- 
tracting off that perfect square, you are left with a quadratic form in precisely 
one fewer variable. (The “p rec * se Jy one fewer variable” guarantees linear inde- 
pendence.) This works when there is at least one square, but what should you 
do with something like the following? 


Example 3.5.7 (Quadratic form with no squares). Consider the qua- 
dratic form 


There wasn’t anything magical 
about the choice of u, as Exercise 
3.5.8 asks you to show; almost 
anything would have done. 


There is another meaning one 
can imagine for linear indepen- 
dence, which applies to any func- 
tions on, . . . c*m, not necessarily 
linear: one can interpret the equa- 
tion 


doij ■+• • • • + c m a m = 0 

as meaning that (cjQi 4- • • • 4- 
CmQm) is the zero function: i.e., 
that 

(ciQj 4- • ■ • 4- c m a m )(x) - 0 

for any x € K n , and say that 
Qi , . . . Q m are linearly indepen- 
dent if ci =••■ = c m = 0. In 
fact, these two meanings coincide: 
for a matrix to represent the lin- 
ear transformation 0, it must be 
the 0 matrix (of whatever size is 
relevant, here lxn), 


Q(x) — xy - xz 4- yz. 3.5.15 

One possibility is to introduce the new variable u = x - y, so that we can trade 
x for it -1- y, getting 


2 , 2 
uz — z + z 


(u + y)y - (u 4- y)z + yz = y 2 + uy - uz 

= (*+!) 2 -t- 
= (^) 2 -(!-) 2 - 2 
= (H) 2 -(HH 2 - 2 

Again, to check that the functions 



are linearly independent, we can write them as rows of a matrix: 

A 


"1/2 

1/2 

0* 


'l 

0 

O' 

1/2 

-1/2 

1 

row reduces to 

0 

1 

0 

0 

0 

1 


0 

0 

1 


3.5.16 


3.5.17 


3.5.18 


Theorem 3.5.3 says that a quadratic form can be expressed as a sum of 
linearly independent functions of its variables, but it does not say that whenever 
a quadratic form is expressed as a sum of squares, those squares are necessarily 
linearly independent. 


Example 3.5.8 (Squares that are not linearly independent). We can 

write 

2x 2 + 2y 2 4- 2xy = x 2 4- y 2 + (x 4- y) 2 3.5.19 

or 

2 

2x 2 + 2y 2 + 2xy = (-fix + + (\Zf y ) ' 


3.5.20 
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Only the second decomposition reflects Theorem 3.5.3. In the first, the linear 
functions x , y and x 4- y are not linearly independent, since x + y is a linear 
combination of x and y. 


Proof of Theorem 3.5.3 


Definition 3.5.9 is equivalent to 
saying that a quadratic form is 
positive definite if its signature 
is (n,0) and negative definite if 
its signature is (0, n), as Exercise 
3.5.14 asks you to show. 

The fact that the quadratic 
form of Example 3.5.10 is negative 
definite means that the Laplacian 
in one dimension (i.e. , the trans- 
formation that takes p to p") is 
negative. This has important ram- 
ifications; for example, it leads to 
stable equilibria in elasticity. 


When we write Q(p) we mean 
that Q is a function of the coeffi- 
cients of p. For example, if p = 


x 2 + 2x + 1 , then Q(p) — Q 



All the essential ideas for the proof of Theorem 3.5.3, part (a) are contained in 
the examples; a formal proof is in Appendix A. 10. 

Before proving part (b), which says that the signature ( k,l ) of a quadratic 
form does not depend on the specific linear functions chosen for its decomposi- 
tion, we need to introduce some new vocabulary. 


Definition 3.5.9 (Positive and negative definite). A quadratic form 
Q(x) is positive definite if and only if Q(x) > 0 when x/0. It is negative 
definite if and only if Q(x) < 0 when X £ 0. 


The fundamental example of a positive definite quadratic form is Q(x) = |x| 2 . 
The quadratic form of Example 3.5.2, 

Q(p) = f (p(t)) 2 dt, is also positive definite. (3.5.1) 

Jo 

Here is an important example of a negative definite quadratic form. 


Example 3.5.10 (Negative definite quadratic form). Let P* be the space 
of polynomials of degree < k, and V ayb C Pk the space of polynomials p that 
vanish at a and b for some a < b. Consider the quadratic form Q : V ayb — ► R 
given by 


Q(p) = f P(t)p"(t)dt • 

J a 


3.5.21 


Using integration by parts, 
»b 


Q(p)- j P(t)p”(t) dt = p(b)p'{b) - p{a)p'{a) - f (p'{t)fdt < 0. 3.5.22 


= 0 by def. 


Since p 6 V a j>* P{ Q ) — p{b) = 0 by definition; the integral is negative unless 
p' = 0 (i.e., unless p is constant); the only constant in V a>b is 0. A 


Proof of Theorem 3.5.3 (b) 

Now to prove that the signature ( k , l) of a quadratic form Q on R n depends 
only on Q and not on how Q is decomposed. This follows from Proposition 
3.5.11. 
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Recall that when a quadratic 
form is written as a “sum” of 
squares of linearly independent 
functions, k is the number of 
squares preceded by a plus sign, 
and / is the number squares pre- 
ceded by a minus sign. 


Proposition 3.5.11. The number k is the largest dimension of a subspace of 
R n on which Q is positive definite and the number l is the largest dimension 
of a subspace on which Q is negative definite. 

Proof. First let us show that Q cannot be positive definite on any subspace 
of dimension > k. Suppose 

Q(x) = (a 1 (x) 2 + --. + a*(x) 2 ) - (a*. + 1 (x) 2 + • • • + a* +/ (x) 2 ) 3.5.23 

V V ' N v ^ 

A; terms l terms 

is a decomposition of Q into squares of linearly independent linear functions, 
and that W C 3." is a subspace of dimension k\ > k. Consider the linear 
transformation W — * given by 


’oi(w) 

.Mw) 


3.5.24 


Since the domain has dimension ki . which is greater than the dimension k of 
the range, this mapping has a non-trivial kernel. Let w ^ 0 be an element of 
this kernel. Then, since the terms ojfw) 2 + • • • + a*(w ) 2 vanish, we have 


“Non-trivial” kernel means the 
kernel is not 0. 


Q(w) = ~(a k+ i( w) 2 + • • * + a fr+i (w) 2 ) < 0. 3.5.25 

So Q cannot be positive definite on any subspace of dimension > k. 

Now we need to exhibit a subspace of dimension k on which Q is positive 
definite. So far we have k + 1 linearly independent linear functions a, . . . . , a t+i . 
Add to this set linear functions a*+/ + i, . . . . a n such that. r* l5 ... ,a n form a 
maximal family of linearly independent linear functions, i.e., a basis of the 
space of 1 x n row matrices (see Exercise 2.4.12). 

Consider the linear transformation T : ;R n - 


T:x 


"c*a- + i(x) 

L M*) J 


3.5.26 


The rows of the matrix corresponding to T are thus the linearly independent 

row matrices q^ +i , . . . , cv„; like Q, they are defined on 3;" so the matrix T is n 
wide. It is n - k tall. 

Let us see that kerT has dimension k, and is thus a subspace of dimension 
k on which Q is positive definite. The rank of T is equal to the number of its 
linearly independent rows (Theorem 2.5.13), i.e, dimlrngT = n - fr, so bv the 
dimension formula, 


dim ker T + dim Img T 


n. i.e, dim ker T = k. 


3.5.27 


n - k 
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The quadratic form of Example 
3.5.5 has rank 2; the quadratic 
form of Example 3.5.6 has rank 3. 


It follows from Exercise 3.5.14 
that only nondegenerate forms can 
be positive definite or negative 
definite. 


For any v € kerT, the terms a* + i(v), . . . ,a*+/(v) of Q(v) vanish, so 

Q(v) = O'i(v) 2 4- • • • 4- ^(v) 2 > 0. 3.5.28 

If Q(v) = 0, this means that every term is zero, so 

ori(v) = • • = a„(v) = 0, 3.5.29 

which implies that v = 0. So we see that if v ^ 0, Q is strictly positive. 

The argument for l is identical. □ 

Proof of Theorem 3.5.3(b). Since the proof of Proposition 3.5.11 says 
nothing about any particular choice of decomposition, we see that k and l 
depend only on the quadratic form, not on the particular linearly independent 
functions we use to represent it as a sum of squares. □ 

Classification of quadratic forms 

Definition 3.5.12 (Rank of a quadratic form). The rank of a quadratic 
form on K n is the number of linearly independent squares that appear when 
the quadratic form is represented as a sum of linearly independent squares. 


Definition 3.5.13 (Degenerate and nondegenerate quadratic forms). 
A quadratic form on 3R n with rank m is nondegenerate if m = n. It is 
degenerate if m <n. 


The examples we have seen so far in this section are all nondegenerate; a 
degenerate one is shown in Example 3.5.15. 

The following proposition is important; we will use it to prove Theorem 3.6.6 
about using quadratic forms to classify critical points of functions. 

Proposition 3.5.14. If Q : M n — » R is a positive definite quadratic form , 
then there exists a constant C > 0 such that 

Q{2) > C|*| a 3.5.30 

for allii € R n . 


Proof. Since Q has rank n, we can write 


<?(*) = (<*1 ( x )) 2 + • • • + ( q „( x )) 2 


3.5.31 
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Another proof (shorter and less 
constructive) is sketched in Exer- 
cise 35. 15. 

Of course Proposition 3.5.14 
applies equally well to negative 
definite quadratic forms; just use 
- C . 


as a sum of squares of n linearly independent functions. The linear transfor- 
mation T : R n — ♦ iR n whose rows are the a* is invertible. 

Since Q is positive definite, all the squares in Equation 3.5.31 are preceded 
by plus signs, and we can consider Q{x) as the length squared of the vector Tx. 
Thus we have 

Q(x) = |Tx| 2 > 3.5.32 


so you can take C = 1/|T -1 | 2 . (For the inequality in Equation 3.5.32, recall 
that jxj = |T _1 Tx| < |T _1 ||Tx|.) □ 


Example 3.5.15 (Degenerate quadratic form). The quadratic form 


Q(P) = 



3.5.33 


on the space P* of polynomials of degree at most A: is a degenerate quadratic 
form, because Q vanishes on the constant polynomials. A 


3.6 Classifying Critical Points of Functions 



Figure 3.6.1. 

The graph of x 2 - y 2 , a typical 
saddle. 


By “strict maximum” we mean 
f(xo) > f(x), not f(x o) > f(x)\ 
by “strict minimum” we mean 
f(x o) < f(x ), not f(x o) < f(x). 


In this section we see what the quadratic terms of a function’s Taylor polyno- 
mial tell us about the function’s behavior. The quadratic terms of a function’s 
Taylor polynomial constitute a quadratic form. If that quadratic form is non- 
degenerate (which is usually the case), its signature tells us whether a critical 
point (a point where the first derivative vanishes) is a minimum of the function, 
a maximum, or a saddle (illustrated by Figure 3.6.1). 

Finding maxima and minima 

A standard application of one-variable calculus is to find the maxima or minima 
of functions by finding the places where the derivative vanishes, according to 
the following theorem, which elaborates on Proposition 1.6.8. 

Theorem 3.6.1. (a) Let U C R be an open interval and / : U — ► R be a 
differentiable function. If xo € U is a maximum or a minimum of f, then 
f'(x 0 ) = 0. 

(b) If f is twice differentiable , and if f(x 0 ) = 0 and f"(x 0 ) < 0, then x 0 
is a strict local maximum of f, i.e., there exists a neighborhood V c U ofxo 
such that }(xo) > f(x) for all x 6 V - {so}- 

(c) If f is twice differentiable , and if f'(x 0 ) = 0 and f"(x 0 ) > 0, then x 0 
is a strict local minimum of f, i.e., there exists a neighborhood V c U of x 0 
such that f(xo) < }(x) for all x eV ~ {a*}. 
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The plural of extremum is ex- 
trema. 


Part (a) of Theorem 3.6.1 generalizes in the most obvious way. So as not 
to privilege maxima over minima, we define an extremum, of a function to be 
either a maximum or a minimum. 


Note that the equation 
(D/(x)] = 0 

is really n equations in n variables, 
just the kind of thing Newton’s 
method is suited to. Indeed, one 
important use of Newton’s method 
is finding maxima and minima of 
functions. 


In Definition 3.6.3, saying that 
the derivative vanishes means that 
all the partial derivatives vanish. 
Finding a place where all partial 
derivatives vanish means solving n 
equations in n unknowns. Usu- 
ally there is no better approach 
than applying Newton’s method, 
and finding critical points is an 
important application of Newton’s 
method. 


Theorem 3.6.2 (Derivative zero at an extremum). Let U C R n be 
an open subset and f : U — ► IR be a differentiable function. If xq € U is an 
extremum of /, then [D/(xo)] = 0. 

Proof. The derivative is given by the Jacobian matrix, so it is enough to show 
that if xo is an extremum of /, then Dif(x 0 ) = 0 for all i — 1 , . . . ,n. But 
Dif(x o) = p'(0), where g is the function of one variable g(t) = f(x 0 + te*), and 
our hypothesis also implies that g has an extremum at t = 0, so g'( 0) = 0 by 
Theorem 3.6.1. □ 

It is not true that every point at which the derivative vanishes is an ex- 
tremum. When we find such a point (called a critical point) , we will have to 
work harder to determine whether it is indeed a maximum or minimum. 

Definition 3.6.3 (Critical point). Let U C R n be open, and / : U -+ R be 
a differentiable function. A critical point of / is a point where the derivative 
vanishes. 


Example 3.6.4 (Finding critical points). What are the critical points of 
the function 

/(y) =x + x 2 + xy + y 3 ? 3.6.1 

The partial derivatives are 


D, f(y) = 1 + 2x + y< Ai/(*) =x + 3y 2 . 3.6.2 

In this case we don’t need Newton’s method, since the system can be solved 
explicitly: substitute x = —3 y 2 from the second equation into the first, to find 


1 + V ~ by 2 = 0; 


i.e. 


V = 


1± VTT24 
12 



3.6.3 


Substituting this into x = -(1 + y)/2 (or into x = -3 y 2 ) gives two critical 
points: 


a i = ( fjj) and A 3.6.4 


Remark 3.6.5 (Maxima on closed sets). Just as in the case of one variable, 
a major problem in using Theorem 3.6.2 is the hypothesis that U is open.’ 
Often wc want to find an extremum of a function on a closed set, for instance 
the maximum of x 2 on [0,2]. The maximum, which is 4, occurs when x = 2, 
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In Section 3.7, we will see that 
sometimes we can analyze the be- 
havior of the function restricted to 
the boundary, and use the variant 
of critical point theory developed 
there. 


In evaluating the second de- 
rivative, remember that Dif(&) 
means the second partial deriva- 
tive DiDif. evaluated at a. It 
does not mean D\ times /(a). In 
this case we have D*f = 2 and 
D\D 2 f — 1; these are constants, 
so where we evaluate the deriva- 
tive doesn’t matter. But Df(/) =■ 
6y\ evaluated at a this gives 3. 


which is riot a point where the derivative of x 2 vanishes. Especially when 
we have used Theorem 1.6.7 to assert that a maximum exists in a compact 
subset, we need to check that this maximum occurs in the interior of the region 
under consideration, not on the boundary, before we can say that, it is a critical 
point. A 

The second derivative criterion 

Is either of the critical points given by Equation 3.6.4 an extremum? In one 
variable, we would answer this question by looking at the sign of the second 
derivative. The right generalization of u the second derivative” to higher di- 
mensions is “the quadratic form given by the quadratic terms of the Taylor 
polynomial.” It seems reasonable to hope that since (like every sufficiently dif- 
ferentiable function) the function is well approximated near these points by its 
Taylor polynomial, the function should behave like its Taylor polynomial. 

Let us apply this to the function in Example 3.6.4, /(x) = x -F x 2 + xy + y 3 . 

Evaluating its Taylor polynomial at a] — ^ ” 1 ^ 2 )’ we 8 et 

^/.ai( a i + h) = + ^2^1 + + g 3^2 • 3.6.5 

/(*) second derivative 


The second derivative is a positive definite quadratic form: 
/if + h\h 2 + -hi = ^hi + 


5 o 

+ -h 2 , with signature (2,0). 3.6.6 


What happens at the critical point a 2 = ^ j? Check your answer below. 17 

How should we interpret these results? If we believe that the function behaves 
uear a critical point like its second degree Taylor polynomial, then the critical 
point ai is a minimum; as^the increment vector h — > 0, the quadratic form 
goes to 0 as well, and as h gets bigger (i.e., we move further from a^, the 
quadratic form gets bigger. Similarly, if at a critical point the second derivative 
is a negative definite quadratic form, we would expect it to be a maximum. But 
what about a critical point like a 2 , where the second derivative is a quadratic 
form with signature (1,1)? 

You may recall that even in one variable, a critical point is not necessarily an 
extremum: if the second derivative vanishes also, there are other possibilities 

17 ~~~ 


2 - 4 1 J 

^/.a 2 ( a 2 + h) = -— + -2/tj 4- hi hi + — ( — 2)/if, with quadratic form 
hi + hih 2 — hi = (hi + — -/?§, which has signature (1, 1). 
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The quadratic form in Equa- 
tion 3.6.7 is the second degree 
term of the Taylor polynomial of 
/ at a. 

We state the theorem as we 
do, rather than saying simply that 
the quadratic form is not positive 
definite, or that it is not nega- 
tive definite, because if a quadratic 
form on P/‘ is degenerate (i.e., 
k + l < n), then if its signature is 
(k, 0), it is positive, but not pos- 
itive definite, and the signature 
does not tell you that there is a 
local minimum. Similarly, if the 
signature is (0 ,fc), it does not tell 
you that there is a local maximum. 

We will say that a critical point 
has signature (k,l) if the corre- 
sponding quadratic form has sig- 
nature ( kj ). For example, x 2 + 
y — z has a saddle of signature 
(2, 1) at the origin. 

The origin is a saddle for the 
function x 2 - y 2 . 


(the point of inflection of f(x ) = x 3 , for instance). However, such points are 
exceptional: zeroes of the first and second derivative do not usually coincide. 
Ordinarily, for functions of one variable, critical points are extrema. 

This is not the case in higher dimensions. The right generalization of “the 
second derivative of / does not vanish” is “the quadratic terms of the Taylor 
polynomial are a 11011 -degenerate quadratic form.” A critical point at which this 
happens is called a non- degenerate critical point. This is the ordinary course of 
events (degeneracy requires coincidences). But a non-degenerate critical point 
need not be an extremum. Even in two variables, there are three signatures 
of non-degenerate quadratic forms: (2,0), (1,1) and (0,2). The first and third 
correspond to extrema, but signature (1,1) corresponds to a saddle, point. 

The following theorems confirm that the above idea really works. 

Theorem 3.6.6 (Quadratic forms and extrema). Let U c E n be an 

open set, f : U —* IR be twice continuously differentiable (i.e., of class C 2 ), 
and let aG U be a critical point of f, i.e., [D/(a)] = 0. 

(a) If the quadratic form 

Q(h) = E 7T (£»//(«)) h' 3.6.7 

Ti ' 

is positive definite (i.e., has signature (n, 0)), then a is a strict local minimum 
off. If the signature of the quadratic form is ( k , l) with l > 0, then the critical 
point is not a local minimum. 

(b) If the quadratic form is negative definite, (i.e., has signature (0, n)), 
then a is a strict local maximum of f. If the signature of the quadratic form 
is ( k , l) with k > 0, then the critical point is not a maximum. 


Definition 3.6.7 (Saddle). If the quadratic form has signature (k,l) with 
& > 0 and l > 0, then the critical point is a saddle. 


Theorem 3.6.8 (Behavior of functions near saddle points). Let U c 
R n be an open set, and let f : U -> Ik be a C 2 function. If f has a saddle at 
a eU, then in every neighborhood of a there are points b with /(b) > /(a), 
and points c with /(c) < /(a). 

Proof of 3.6.6 (Quadratic forms and extrema). We will treat case (a) 
only; case (b) can be derived from it by considering -/ rather than /. 

We can write 


/(a + h) = /(a) 4- Q(h) + r(h). 


3.0.8 
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where the remainder r(h) satisfies 


lim^> 

h— 0 Ihl 2 


= 0 . 


Equation 3.6.10 uses Proposi- 
tion 3.5.14. 


Thus if Q is positive definite, 

/(a + h) - /(a) _ Q(h) + r(h) > g | r(h) 
|h| 2 |h| 2 |h| 2 _ |h| 2 ’ 


3.6.9 


3.6.10 


The constant C depends on Q, 
not on the vector on which Q is 
evaluated, so Q(h) > C|h[ 2 : i.e., 

Q( h) > C|h[ 2 = c 
|h| 2 " |h| 2 


where C is the constant of Proposition 3.5.14 — the constant C > 0 such that 
Q(x) > C|x| 2 for all x € IR n , when Q is a positive definite. 

The right-hand side is positive for h sufficiently small (see Equation 3.6.9), 
so the left-hand side is also, i.e., /(a + h) > /(a) for h sufficiently small; i.e., a 
is a strict local minimum of /. 

If Q has signature ( k,l ) with l > 0, then there is a subspace V C M" of 
dimension l on which Q is negative definite. Suppose that Q is given by the 
quadratic terms of the Taylor polynomial of / at a critical point a of /. Then 
the same argument as above shows that if h C V and |x| is sufficiently small, 
then the increment /( a+h) - /(a) will be negative, certainly preventing a from 
being a minimum of /. O 


Proof of 3.6.8 (Behavior of functions near saddle points). Write 


/(a + h) = /(a) + Q(h) + r(h) and lim ~ = 0. 

h-o |h| 2 

as in Equations 3.6.8 and 3.6.9. 

By Theorem 3.5.11 there exist subspaces V and W of IR n such that Q is 
positive definite on V and negative definite on W. 

If h € V, and t > 0, there exists C > 0 such that 


A similar argument about W 
shows that there are also points 
c where /(c) < /(a). Exercise 
3.6.3 asks you to spell out this 
argument. 


/(a-Mh)-/(a) 

t 2 


and since 


t 2 C?(h) + r(th) 

t 2 


> /(a) 4 ■ C + 


r(th) 

IT' 


lim 

i—o 


r(th) 

~W~ 



it follows that /(a t h) > /(a for t > 0 sufficiently small. □ 


3.6.11 


3.6.12 


Degenerate critical points 

When /(x) has a critical point at a, such that the quadratic terms of the 
Taylor polynomial of / at a are a nondegenerate quadratic form, the function 
near a behaves just like that quadratic form. We have just proved this when the 
quadratic form is positive or negative definite, and the only thing preventing 
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us from proving it for any signature of a nondegenerate form is an accurate 
definition of “hehave just like its quadratic terms.” ls 

But if the quadratic is degenerate, there are many possibilities; we will not 
attempt to classify them (it is a big job), but simply give some examples. 




FIGURE 3.6.2. The upper left-hand figure is the surface of equation z — x 2 + y 3 , and 
the upper right-hand figure is the surface of equation z = x 2 -f y 4 . The lower left-hand 
figure is the surface of equation 2 = x 2 - y\ Although the three graphs look very 
different, all three functions have the same degenerate quadratic form for the Taylor 
polynomial of degree 2. The lower right-hand figure shows the monkey saddle; it is 
the graph of z = x 3 - 2 xy 2 t whose quadratic form is 0. 

Example 3.6.9 (Degenerate critical points). The three functions x 2 H-j/ 3 , 
x + V ’ x ~V 4 ' an d all have the same degenerate quadratic form for the Taylor 
polynomial of degree 2: x 2 . But they behave very differently, as shown in Figure 

3.6.2 (upper left, upper right and lower left). The second one has a minimum, 
the other two do not. A 

Example 3.6.10 (Monkey saddle). The function / = * 3 - 2 xy 2 has 

a critical point that goes up in three directions and down in three also (to 
accommodate the tail). Its graph is shown m Figure 3.6.2, lower right. A 

;®A Precise statement is called the Morse lemma; it can be found (Lemma 2.2) on 
p. o of J. Milnor, Morse Theory , Princeton University Press, 1963. 
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3.7 Constrained Critical Points and Lagrange 
Multipliers 


Yet another example occurs in 
the optional subsection of Section 
2.8: the norm of a matrix A is 

sup |Ax|. 

I*! = i 

What is sup|Ax| when we require 
that x have length 1? 


The shortest path between two points is a straight line. But what is the shortest 
path if you are restricted to paths that lie on a sphere (for example, because 
you are flying from New York to Paris)? This example is intuitively clear but 
actually quite difficult to address. In this section we will look at problems in 
the same spirit, but easier. We will be interested in extrema of a function / 
when / is restricted to some manifold X C IR n . 

In the case of the set X C M 8 describing the position of a link of four rods in 
the plane (Example 3.2.1) we might imagine that the origin is attracting, and 
that each vertex x t has a “potential” |Xj| 2 , perhaps realized by rubber bands 
connecting the origin to the joints. Then what is the equilibrium position, 
where the link realizes the minimum of the potential energy? Of course, all 
four vertices try to be at the origin, but they can’t. Where will they go? 

In this section we provide tools to answer this sort, of question. 


Finding constrained critical points using derivatives 


Recall (Definition 3.2.6) that 
T*X is the tangent space to a 
manifold X at a. 

Geometrically, Theorem 3.7.1 
means that a critical point of 
restricted to X is a point a such 
that the tangent space to the con- 
straint, T*X, is a subspace of 
ker(D^?(a)J, the tangent space to 
a level set of p. 


A characterization of extrema in terms of derivatives should say that in some 
sense the derivative vanishes at an extremum. But when we take a function 
defined on IR n and consider its restriction to a manifold of we cannot assert 
that an extremum of the restricted function is a point at which the derivative of 
the function vanishes. The derivative of the function may vanish at points not 
in the manifold (the shortest “unrestricted” path from New York to Paris would 
require tunneling under the Atlantic Ocean). In addition, only very seldom will 
a constrained maximum be an unconstrained maximum (the tallest child in 
kindergarten is unlikely to be the tallest child in the entire elementary school). 
So only very seldom will the derivative of the function vanish at a critical point 
of the restricted function. 

What we can say is that at an extremum of the function restricted to a 
manifold, the derivative of the function vanishes on all tangent vectors to the 
manifold : i.e., on the tangent space to the manifold. 

Theorem 3.7.1. If X C & n is a manifold, U CW 1 is open, : U — ♦ M is a 
C 1 function and a € X fi U is a local extremum of <p restricted to X, then 

T*X C ker[Dy?(a)]. 3.7.1 

Definition 3.7.2 (Constrained critical point). A point a such that 
T b X C ker[Dy?(a)] is called a critical point of constrained to X. 



3.7 Constrained Critical Points and Lagrange Multipliers 305 


A level set of a function p is 
those points such that p = a. 
where c is some constant. We used 
level sets in Section 3.1. 


y 


Example 3.7.3 (Constrained critical point: a simple example). Sup- 
pose we wish to maximize the function p ^ j = xy on I lie first quadrant of the 

circle x- +y* — 1. As shown in Figure 3.7.1, some level sets of that function do 
not intersect the circle, and some intersect it in two points, but one. xy = 1/2. 
intersects it at the point a = ( That point, is the critical point of p 

constrained to the circle. The tangent space to the constraint (i.e., to the circle) 


consists of the vcc 


. £ 

::tors . 

y 


where i = -y. This tangent space is a subspace of 


the tangent space to the level set xy = 1/2. In fact, the two are the same. 



\ 


Example 3.7.4 (Constrained critical point in higher dimensions). Sup- 

( x\ 

pose we wish to find the minimum of the function ply I = x 2 +y 2 +z 2 , when 

it is constrained to the ellipse (denoted X ) that is the intersection of the cylinder 
x 2 4- y 2 = 1. and the plane of equation x - z, shown in Figure 3.7.2. 


Figure 3.7.1. 

The unit circle and several level 
curves of the functiou xy. The 
level curve xy - 1/2, which real- 
izes the maximum of xy restricted 
to the circle, is tangent to the cir- 
cle at the point 
the maximum is 


0/S’ whore 

realized. 


Since p measures the square 
of the distance from the origin, 
we wish to find the points on the 
ellipse. X that are closest to the 
origiu. 



Figure 3.7.2. At the point -a = ^ - 1 J , tile distance to theorigih has a minimum 

on the ellipse; at this point, the tangent space to the ellipse is a snbspace of the tangent 
space to the sphere. 



In keeping with the notation 
introduced in Section 3.1, y = 0 
indicates the plane where there is 
zero increment in the direction of 


the y-axis; thus y = 0 denotes the 
plane tangent to the sphere (and 


to the cylinder) at a •= 



it also denotes the plane tangent 
to the sphere (and the cylinder) at 


0 \ 

a = I 1 I . It is the plane y — 0 

w 

translated from the origin. 


The proof of Theorem 3.7.1 is 
easier to understand if you think 
in terms of parametrizations. Sup- 
pose we want to find a maximum 

of a function <p ^ ^ on the unit 

circle in R 2 . One approach is 
to parametrize the circle by t *-* 

(sinf)’ *°°k ^ or un- 
constrained maximum of the new 
function of one variable, <pi{t) = 





We did just this in 


Example 2.8.8. In this way, the 
restriction is incorporated in the 
parametrization. 
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(°\ 

The points on the ellipse nearest the origin are a = 111 and -a - 

they are the minima of <p constrained to X. In this case 

ker[D^(a)l = ker[0 ,2,0] and ker[D<,p(-a)] = ker[0 ,-2,0], 



i.e., at these critical points, ker[D<^] is the space y = 0. The tangent space 
to the ellipse at the points a and —a is the intersection of the planes of equa- 
tion y = 0 (the tangent space to the cylinder) and x - z (which is both the 
plane and the tangent space to the plane). Certainly this is a subspace of 



Figure 3.7.3. The composition <pog: the parametrization g takes a point in R 2 to 
the constraint manifold X\ <p takes it to R. An extremum of the composition, at ai, 
corresponds to an extremum of ip restricted to X , at a; the constraint is incorporated 
into the parametrization. 


Proof of Theorem 3.7.1. Since X is a manifold, near a, X is the graph 
of a map g from some subset U\ of the space E\ spanned by k standard basis 
vectors, to the space E 2 spanned by the other n — k standard basis vectors. 
Call aj and a 2 the projections of a onto E\ and E 2 respectively. 

Then the mapping g(xi) = xi + g(xi) is a parametrization of X near a, and 
AT, which is locally the graph of g, is locally the image of g. Similarly, T&X, 
which is locally the graph of [Dg(ai)], is also locally the image of [Dg(ai)]. 

Then saying that (p on X has a local extremum at a is the same as saying 
that the composition <p o g has an (unconstrained) extremum at ai , as sketched 
in Figure 3.7.3. Thus [D(y? o g)(ai)] = 0. This means exactly that [D^>(a)] 
vanishes on the image of [Dg(ai)], which is the tangent space T a X. □ 

This proof provides a straightforward approach to finding constrained critical 
points, provided you know the “constraint manifold” by a parametrization. 
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Example 3.7.5 (Finding constrained critical points using a parametri- 
zation). Say we want to find local critical points of the function 





= x + y + z , on the surface parametrized by g : 




> 


shown in Figure 3.7.4 (left). Instead of looking for constrained critical points 
Exercise 3.7.1 asks you to show of <p, we will look for (ordinary) critical points of <p o g. We have 
that g really is a parametrization. 


Oi(^°g) = vcosuv + 2 + v 

p o g = sin uv + u + (u + v) + uv, so 

Di(p ° g) = u cos uv + 1 + u\ 

setting these both to 0 and solving them gives 2u - v = 0. In the parameter 
space, the critical points lie on this line, so the actual constrained critical points 
lie on the image of that line by the parametrization. Plugging v — 2u into 
D^pog) gives 

We could have substituted v = 

2u into D 2 {p o g) instead. ucos2u 2 + u + 1 = 0, 3.7.2 

whose graph is shown in Figure 3.7.4 (right). 


Figure 3.7.4. 

Left: The surface X parametrized 
sin uv + u \ 
u + v 1 . The 
uv / 

critical point where the white tan- 
gent plane is tangent to the surface 
corresponds to u = —1.48. 

Right: The graph of u cos 2u 2 + 
u + 1 = 0. The roots of that 
equation, marked with black dots, 
give values of the first coordinates 
of critical points of y>(x) = x +y+z 
restricted to X. 




This function has infinitely many zeroes, each one the first coordinate of a 
critical point; the seven visible in Figure 3.7.4 are approximately u ~ -2.878, 
-2.722, -2.28, -2.048, —1.48 — .822, -.548. The image of the line v = 2u is 
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represented as a dark curve on the surface in Figure 3. 7.4, together with the 

tangent plane at the point corresponding to u = -1.48. 

Solving that equation (not necessarily easy, of course) will give us the values 
of the first coordinate (and because 2 u — v, of the second coordinate) of the 
points that are critical points of <p constrained to X . 

Notice that the same computation works if instead of g we use 


sin uv 


gl : _► I u + v | , which gives the “surface” Xi shown in Figure 3.7.5, 

* uv 


but this time the mapping gi is emphatically not a parametrization 


Figure 3.7.5. 

Left: The “surface” X\ is the 
image of 


gi 


■(:) 



it is a subset of the surface Y of 
equation x — sin z, which resem- 
bles a curved bench. 

Right: The graph of u cos u 2 + u -f 
1 = 0. The roots of that equation, 
marked with black dots, give val- 
ues of the parameter u such that 
<p(x) = x + y 4- z restricted to X\ 

has “critical points” at gi 

These are not true critical points, 
because we have no definition of 
a critical point of a function re- 
stricted to an object like X\. 




Since for any point 



€ X\ y x = sin z, we see that X\ is contained in the 


surface Y of equation x = sin z, which is a graph of a; as a function of z. But X\ 
covers only part of Y, since y 2 -4z = {u-\-v) 2 -4uv = (u-v) 2 > 0; it only covers 

the part where y 2 - 4z > 0, 19 and it covers it twice, since gi ^ =gi 

The mapping gi folds the (u, u)-plane over along the diagonal, and pastes the 
resulting half-plane onto the graph Y. Since gi is not one to one, it does not 
qualify as parametrization (see Definition 3.1.21; it also fails to qualify because 
its derivative is not one to one. Can you justify this last statement? 20 ) 

Exercise 3.7.2 asks you to show that the function (p has no critical points 
on Y: the plane of equation x + y + z = c is never tangent to Y . But if you 
follow the same procedure as above, you will find that critical points of (p ogj 
occur when u = v, and it cosu 2 + u + 1 = 0. What has happened? The critical 


19 We realized that y 2 - 4z = (u + v) 2 — 4uv = (u — v) 2 > 0 while trying to 
understand the shape of X \ , which led to trying to understand the constraint imposed 
by the relationship of the second and third variables. 

cos uv cos uv 
1 1 


20- 


The derivative of gi is 

i 

v u 

this matrix are not linearly independent. 


; at points where u = v, the columns of 
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Recall that the Greek A is pro- 
nounced “lambda.” 

We call F \ , . . . , F m constraint 
functions because they define the 
manifold to which <p is restricted. 


In our three examples of La- 
grange multipliers, our constraint 
manifold is defined by a scalar- 
valued function F, not by a vector- 

valued function F = 

[F m 

But the proof of the spectral the- 
orem (Theorem 3.7.12) involves a 
vector- valued function. 


points of <^ogj now correspond to “fold points,” where the plane x -I- y 4- z = c» 
is tangent not to the surface Y, nor to the “surface” X\ , whatever that would 
mean, but to the curve that is the image of u = v by gi. A 


Lagrange multipliers 


The proof of Theorem 3.7.1 relies on the parametrization g of X. What if you 
know a manifold only by equations? In this case, we can restate the theorem. 

Suppose we are trying to maximize a function g> on a manifold X. Suppose 
further that we know X not by a parametrization but by a vector-valued equa- 

rfi i 


tion, F(x) — 0, where F = 


goes from an open subset U of R n to M m , 


LF m J 


and [DF(x)j is onto for every x € X. 

Then, as stated by Theorem 3.2.7, for any a € X, the tangent space T a X is 
the kernel of [DF(a)J: 


T a X = kerjDF(a)]. 3.7.3 

So Theorem 3.7.1 asserts that for a mapping <p : U — * R, at a critical point of 
V? on X, we have 


ker[DF(a)j C ker[Dp(a)]. 
This can be reinterpreted as follows. 


3.7.4 


Theorem 3.7.6 (Lagrange multipliers). Let X be a manifold known by 
a vector- valued /unction F. Ifip restricted to X has a critical point at a 6 X, 
then there exist numbers \ lt . . . , A m such that the derivative of ip at a is a 
linear combination of derivatives of the constraint functions: 

(D V (a)] = A,[DF,(a)] + • • • + A m [DF m (a)]. 3.7.5 


The numbers Ai, . . . , A m are called Lagrange multipliers. 

Example 3.7.7 (Lagrange multipliers: a simple example). Suppose we 
want to maximize ^ ) = x + y on the ellipse x 2 -f- 2y 2 = 1. We have 

F ( x y )=x 2 + 2y 2 -l, and [df(J)] - [2s, Ay], 3.7.6 

while [D*(£)] = (1, 1]. So at a maximum, there will exist A such that 

[1, 1] = A[2x, 4y]; i.e., x = y = 


3.7.7 
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Inserting these values into the equation for the ellipse gives 


So the maximum 


1 1 , v 


4A 2 ' "16A 2 
of the function on the ellipse is 




3.7.8 


A 3.7.9 


Disjointly means having noth- 
ing in common. 


I 


> _ i 

a b 

Figure 3.7.6. 

The combined area of the two 
shaded squares is 1; we wish to 


Example 3.7.8 (Lagrange multipliers: a somewhat harder example). 
What is the smallest number A such that any two squares Si, S 2 of total area 
1 can be put disjointly into a rectangle of area A ? 

Let us call a and d the lengths of the sides of Si and S 2 , and we may assume 
that a > b > 0. Then the smallest rectangle that will contain the two squares 
disjointly has sides a and a + 6, and area a(a + d), as shown in Figure 3.7.6. The 
problem is to maximize the area a 2 + ad, subject to the constraints a 2 + d 2 = 1, 
and a > b > 0. 

The Lagrange multiplier theorem tells us that at a critical point of the con- 
strained function there exists a number A such that 

[2a + d, a] = A [2a, 2d] 3.7.10 

deriv. of deriv. of 

area function constraint func. 

So we need to solve the system of three simultaneous nonlinear equations 

2a + d = 2aA, a = 2dA, a 2 + d 2 = 1. 3.7.11 


find the smallest rectangle that Substituting the value of a from the second equation into the first, we find 


will contain them both. 


46A 2 - 46A - 6 = 0. 


3.7.12 


If we use 




we end up with 

1 - n/2 


a = 


n/4-2%/2 


< 0 . 


This has one solution d = 0, but then we get a = 0, which is incompatible with 
a 2 + d 2 = 1. The other solution is 

4A 2 - 4A - 1 = 0; i.e., A = 1 * . 3.7.13 

Our remaining equations are now 

£ = 2A = 1±n/ 2 and a 2 + 6 2 = l, 3.7.14 

which, if we require a, d > 0, have the unique solution 

W'TrfeiCV' 5 )- 

This satisfies the constraint a > b > 0, and leads to 


A = a(a -I- d) 


4 + 3y/2 
4 + 2y/2' 


3.7.16 
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The two endpoints correspond 
to the two extremes: all the area in 
one square and none in the other, 
or both squares with the same 

area: at ^ ar 8 er square 

has area 1 , and the smaller rectan- 


gle has area 0; at 



the 


two squares are identical. 


Note that Equation 3.7.18 is a 
system of four equations in four 
unknowns: x , y, z, A. This is typi- 
cal of what comes out of Lagrange 
multipliers except in the very sim- 
plest cases: you land on a system 
of nonlinear equations. 

But this problem isn’t quite 
typical, because there are tricks 
available for solving those equa- 
tions. Often there are none, and 
the only thing to do is to use New- 
ton’s method. 


You are asked in Exercise 3.7.3 
to show that the other critical 
points are saddles. 


We must check (see Remark 3.6.5) that the maximum is not achieved at the 
endpoints of the constraint region, i.e., at the point with coordinates a = 1,6 — 
0 and the point with coordinates a — 6 = s/2/2 It is easy to see that (a -f 
6)a — 1 at both of these endpoints, and since > 1» this is the nnique 

maximum. A 


Example 3.7.9 (Lagrange multipliers: a third example). Find the crit- 
ical points of the function xyz on the plane of equation 


F 



= x -I- 2y + 3z - 1 = 0. 


3.7.17 


Theorem 3.7.6 asserts that a critical point is a solution to 
(1) j yz, xz, xy] = A [1, 2, 3] yz = X 

dcriv. of function xyz deriv. of F <r» 0\ 

(constraint) rtr XZ 

xy = 3A 

1 = x + 2y + 3z. 


(2) x + 2y + 3z =5 1 


or 


3.7.18 


constraint equation 


In this case, there are tricks available. It is not hard to derive xz = 2 yz and 
xy = 3 yz, so if z ^ 0 and y ^ 0, then y = x/2 and z = z/3. Substituting these 
values into the last equation gives x = 1/3, hence y = 1/6 and z = 1/9. At this 
point, the function has the value 1/162. 

Now we need to examine the cases where z = 0 or y = 0. If z = 0, then our 
Lagrange multiplier equation reads 


[0, 0, xy] = A[l, 2, 3] 3.7.19 

which says that A = 0, so one of x or y must also vanish. Suppose y = 0, then 
x = 1, and the value of the function is 0. There are two other similar points. 
Let us summarize: there are four critical points, 

(£)' (£)• 

at the first three our function is 0 and at the last it is 1/162. 

Is our last point a maximum? The answer is yes (at least, it is a local 
maximum), and you can see it as follows. The part of the plane of equation 
x+2y+3z = 1 that lies in the first octant x, y, z > 0 is compact, as |x|, |y|, |z| < 1 
there; otherwise the equation of the plane cannot be satisfied. So our function 
does have a maximum in that octant. In order to be sure that this maximum 
is a critical point, we need to check that it isn’t on the edge of the octant 
(see Remark 3.6.5). That is straightforward, since the function vanishes on the 
boundary, while it is positive at the fourth point. So this maximum is a critical 
point, hence it must be our fourth point. 
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A space with more constraints 
is smaller than a space with fewer 
constraints: more people belong to 
t he set of musicians t han belong to 
the set of red-headed, left-handed 
cello players with last names be- 
ginning with W. Here. Ax = 0 im- 
poses m constraints, and .ix = 0 
imposes only one. 


Proof of Theorem 3.7.6. Since T a X = ker[DF(a)]. the theorem follows 
from Theorem 3.7.1 and from the following lemma from linear algebra, using 
A = (DF(a)j and 3 = [D^(a)]. 


Lemma 3.7.10. Let A = 



LO m J 


!P. m be a linear transformation (i.e.. 


an m x n matrix), and J : Jk n — * Ik be a linear function (a row matrix n wide). 
Then 


ker A C ker $ 

if and only if there exist numbers Ai, . . . , A m such that 


3.7.21 


This is simply saving that the 
only linear consequences one can 
draw from a system of linear equa- 
tions are the linear combinations 
of those equations. 

We don't know anything about 
the relationship of n and m. but 
we know that k < «, since n + 1 
vectors in IP.’* cannot be linearly 
independent. 


3 — AjOi T ■ • ■ -j- A m Q? m . 


3.7.22 


Proof of Lemma 3.7.10. 
In one direction, if 


3 AjOi T • • • T X m ck m . 


3.7.23 


and v € ker A , then v € kero* for i = 1 , . . . , m, so v € ker d. 

Unfortunately, this isn’t the important direction, and the other is a bit 
harder. Choose a maximal linearly independent subset of the a,; by order- 
ing we can suppose that these are q\ a k . Denote the set A'. Then 


ker 



A' 



3.7.24 


(Anything in the kernel of an, ... ,a* is also in the kernel of their linear combi- 
nations.) 

If 0 is not a linear combination of the ai,...,a m , then it is not a linear 
combination of c*! — ,a*. This means that the (A; + 1) x n matrix 


<*i.i ... Oi >n 


<*Ar.l • • ■ 

Pi ... 0n J 


3.7.25 


has k + 1 linearly independent rows, hence k + 1 linearly independent columns: 
the linear transformation B : U n -> &* +I is onto. Then the set of equations 


■« 1.1 

OC 1 . 1 , ■ 


‘ Vl ‘ 


’ 0 * 

L 0i 

• • • &k.n 

• • • 0n. - 


-V n . 


0 

Li. 


3.7.26 
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Generalizing the spectral theo- 
rem to infinitely many dimensions 
is one of the central problems of 
functional analysis. 


For example, 

H /Xx 


X) 


' 1 0 f 


' X\ 

X 2 

■ 

0 1 2 


X-2 

_XA ' 


1 2 9 


. . 


= x\ + x 2 4 - 2x i 4 - 4 x-iX* 4 - 9x 3 . 

quadratic form 

Square matrices exist that have 
no eigenvectors, or only one. Sym- 
metric matrices are a very spe- 
cial class of square matrices, whose 
eigenvectors are guaranteed not 
only to exist, but also to form an 
orthonorinal basis. 

The theory of eigenvalues and 
eigenvectors is the most exciting 
chapter in linear algebra, with 
close connections to differential 
equat ions, Fourier series 

“...when Werner Heisenberg 
discovered ‘matrix’ mechanics in 
1925, he didn’t know what a ma- 
trix was (Max Born had to tell 
him), and neither Heisenberg nor 
Born knew what to make of the 
appearance of matrices in the con- 
text of the atom. (David Hilbert 
is reported to have told them to 
go look for a differential equa- 
tion with the same eigenvalues, if 
that would make them happier. 
They did not follow Hilbert ’s well- 
meant advice and thereby may 
have missed discovering the 
Schrodinger wave equation.)’ 1 
— M. R. Schroeder, Mathematical 
Intelligencer , Vol. 7. No. 4 


has a nonzero solution. The first A* lines say that v is in ker A' , which is equal 
to ker A, but the last line says that it is not in ker (1. □ 

The spectral theorem for symmetric matrices 

In this subsection we will prove what is probably the most important theorem 
of linear algebra. It goes under many names: the spectral theorem, the principle 
axis theorem. Sylvester’s principle of inertia. The theorem is a statement about 
symmetric matrices; recall (Definition 1.2.18) that a symmetric matrix is a 
matrix that is equal to its transpose. For us, the importance of symmetric 
matrices is that they represent quadratic forms: 

Proposition 3.7.11 (Quadratic forms and symmetric 

any symmetric matrix A, the function 

Qa(x) = x- Ax 

is a quadratic form; conversely, every quadratic form 

Q(*) = 51 a,i ' 

ten 

is of the form Qa for a unique symmetric matrix A. 

Actually, for any square matrix M the function Qm(x) = x-Ax is a quadratic 
form, but there is a unique symmetric matrix A for which a quadratic form can 
be expressed as Qa . This symmetric matrix is constructed as follows: each 
entry A,. t on the main diagonal is the coefficient of the corresponding variable 
squared in the quadratic form (i.e., the coefficient of xf) while each entry Aij 
is one-half the coefficient of the term x,Xj. For example, for the matrix at left, 
A] a = 1 because in the corresponding quadratic form the coefficient of x \ is 
1, while A 2.1 = A 1.2 = 0 because the coefficient of X 2 X 1 — x\X 2 — 0. Exercise 
3.7.4 asks you to turn this into a formal proof. 

Theorem 3.7.12 (Spectral theorem). Let A be a symmetric n x n 

matrix with real entries. Then there exists an orthonormal basis v^, ... ,v n 

ofM n and numbers Aj, . . . , A„ such that 

Ay, = A 3.7.29 


matrices). For 

3.7.27 

3.7.28 


Definition 3.7.13 (Eigenvector, eigenvalue). For any square matrix 
A, a nonzero vector if such that Av = Av for some number A is called an 
eigenvector of A. The number A is the corresponding eigenvalue. 
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Exercise 3.7.5 asks you to jus- 
tify the derivative in Equation 
3.7.32, using the definition of a de- 
rivative as limit and the fact that 
A is symmetric. 


In Equation 3.7.34 we take the 
transpose of both sides, remem- 
bering that 

(AB) T = b t a t . 

As often happens in the middle 
of an important proof, the point 
a at which we are evaluating the 
derivative has turned into a vec- 
tor, so that we can perform vector 
operations on it. 


We use A to denote both Lagrange multipliers and eigenvalues; we will see 
that eigenvalues are in fact Lagrange multipliers. 

Example 3.7.14 (Eigenvectors). Let A = | J You can easily check 
that 


LtyH 1 
2 

l + \/5 

r i+Vsi 
2 

and A 

r i-V5 1 
2 

1-75 

r i-v's 
~~ 2 ~ 

i J 

2 

1 


1 

” 2 

1 


and that the two vectors are orthogonal since their dot product is 0: 


l±V^ 1 


1-v^i 

2 


2 

1 


1 


3.7.30 


3.7.31 


The matrix A is symmetric; why do the eigenvectors Vi and V 2 not form the 
basis referred to in the spectral theorem ? 21 


Proof of Theorem 3.7.12 (Spectral theorem). We will construct our 
basis one vector at a time. Consider the function Qa{*) ' K n — * R = x * Ax. 
This function has a maximum (and a minimum) on the (n - l)-sphere S of 
equation Fi (x) = |x| 2 = 1. We know a maximum (and a minimum) exists, 
because a sphere is a compact subset of R n ; see Theorem 1.6.7. We have 

[DQ*(5)]h = a ■ (Ah) + h • (4a) = a T 4h + h T 4a = 2a T 4h, 3.7.32 


whereas 

[DF,(a)]h = 2a T h. 3.7.33 

So 2 S r A is the derivative of the quadratic form Qa, and 2 S T is the derivative 
of the constraint function. Theorem 3.7.6 tells us that if the restriction of Qa 
to the unit sphere has a maximum at Vi, then there exists Ai such that 

2v^A = Ai2vJ r , so A t vi=Ai?i. 3.7.34 

Since A is symmetric, 


Avj = AjVi. 3.7.35 

This gives us our first eigenvector. Now let us continue by considering the 
maximum at V 2 of Qa restricted to the space S n {vi) 1 (where, as above, S is 
the unit sphere in M n , and (Vj) 1 is the space of vectors perpendicular to vj). 

21 They don’t have unit length; if we normalize them by dividing each vector by its 
length, we find that 




do the job. 
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The first equality of Equation 
3.7.39 uses the symmetry of A: if 
A is symmetric, 

v • (j4w) = v t (j4w) = (v T i4)w 
= (v T j4 t )w = (i4v) T w 
= (j4v) • w. 

The second uses Equation 3.7.35, 
and the third the fact that v 2 € 
5n(v,) x . 


That is, we add a second constraint, F 2 , maximizing Qa subject to the two 
constraints 


Fi(x) = 1 and F 2 (x) = x • V] = 0. 3.7.36 

Since [DF2(v2)] = vi, Equations 3.7.32 and 3.7.33 and Theorem 3.7.6 tell us 
that there exist numbers X 2 and /^2, 1 such that 


Av 2 = /i 2 ,lVi + A 2 v 2 . 

Take dot products of both sides of this equation with vi , to find 

(AV 2 ) • V 1 = A 2 V 2 • Vi + /i2,lV] • Vi. 

Using 


3.7.37 

3.7.38 


If you’ve ever tried to find 
eigenvectors, you’ll be impressed 
by how easily their existence 
dropped out of Lagrange multi- 
pliers. Of course we could not 
have done this without the exis- 
tence of the maximum and mini- 
mum of the function Q A , guaran- 
teed by the non-constructive The- 
orem 1.6.7. In addition, we’ve 
only proved existence: there is 
no obvious way to find these con- 
strained maxima of Qa . 


(A\ 2 ) • vi = v 2 • (j4vj) = v 2 • (A1V1) = 0, 3.7.39 

Equation 3.7.38 becomes 

9 = M2,i|vi| 2 4- A 2 v 2 * Vj =/i2,ii 3.7.40 

v — v— ^ 

= 0 since 

so Equation 3.7.37 becomes 

Av 2 = A 2 v 2 - 3.7.41 

We have found our second eigenvector. 

It should be clear how to continue, but let us spell it out for one further step. 
Suppose that the restriction of Qa to S H vj- n has a maximum at v 3 , i.e., 
maximize Qa subject to the three constraints 


Fj(x) — 1 , F 2 (x) = x • = 0, and F 3 (x) = x • v 2 = 0. 3.7.42 

The same argument as above says that there then exist numbers A 3 ,/i 3>1 and 
^3,2 such that 


^v 3 = /x 3)1 vj + /i 3(2 v 2 + A 3 v 3 . 3.7.43 

Dot this entire equation with vi (resp. v 2 ); you will find /z 3>1 = /x 3 2 = 0, and 
we find Av 3 = A 3 v 3 . □ 


The spectral theorem gives us an alternative approach to quadratic forms, 

geometrically more appealing than the completing of squares used in Section 
3.5. 

Exercise 3.7.6 characterizes the 

norm in terms of eigenvalues. o m 1 m 

1 neorem 3.7.15. If the quadratic form Qa has signature ( k,l ), then A has 

k positive eigenvalues and l negative eigenvalues. 
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3.8 Geometry of Curves and Surfaces 

In which we return to curves and surfaces, applying what we have learned 
about Taylor polynomials , quadratic forms, and extrema to discussing their ge- 
ometry: in particular, their curvature. 

A curve acquires its geometry from the space in which it is embedded. With- 
out that embedding, a curve is boring: geometrically it is a straight line. A 
one-dimensional worm living inside a smooth curve cannot tell whether the 
curve is straight or curvy; at most (if allowed to leave a trace behind him) it 
can tell whether the curve is closed or not. 

This is not true of surfaces and higher-dimensional manifolds. Given a long- 
enough tape measure you could prove that the earth is spherical without any 
recourse to ambient space; Exercise 3.8.1 asks you to compute how long a tape 
measure you would need. 

The central notion used to explore these issues is curvature , which comes in 
many flavors. Its importance cannot be overstated: gravitation is the curvature 
of spacetime; the electromagnetic field is the curvature of the electromagnetic 
potential. Indeed, the geometry of curves and surfaces is an immense field, with 
many hundreds of books devoted to it; our treatment cannot be more than the 
barest overview. 22 

We will briefly discuss curvature as it applies to curves in the plane, curves 
in space and surfaces in space. Our approach is the same in all cases: we write 
our curve or surface as the graph of a mapping in the coordinates best adapted 
to the situation, and read the curvature (and other quantities of interest) from 
quadratic terms of the Taylor polynomial for that mapping. Differential geom- 
etry only exists for functions that are twice continuously differentiable; without 
that hypothesis, everything becomes a million times harder. Thus the functions 
we discuss all have Taylor polynomials of degree at least 2. (For curves in space, 
we will need our functions to be three times continuously differentiable, with 
Taylor polynomials of degree 3.) 

The geometry of plane curves 

For a smooth curve in the plane, the “best coordinate system” X.Y at a point 

a = is the system centered at a, with the X-axis in the direction of the 

tangent line, and the Y axis normal to the tangent at that point, as shown in 
Figure 3.8.1. 

For further reading, we recommend Riemannian Geometry, A Beginner's Guide , 
by FYank Morgan (A K Peters, Ltd., Wellesley, MA, second edition 1998) or Differen- 
tial Geometry of Curves and Surfaces, by Manfredo P. do Carmo (Prentice-Hall, Inc., 
1976). 


Curvature in geometry mani- 
fests itself as gravitation. — C. Mis- 
ner, K. S. Thorne, J. Wheeler, 
Gravitation 


Recall (Remark 3.1.2) the fuzzy 
definition of “smooth” as meaning 
“as many times differentiable as is 
relevant to the problem at hand.” 
In Sections 3.1 and 3.2, once con- 
tinuously differentiable was suffi- 
cient; here it is not. 
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Figure 3.8. 1. 


To study a smooth curve at 

a = ^ ^ , we m »ke a the origin 

of our new coordinates, and place 
the X-axis in the direction of the 
tangent to the curve at a. Within 
the shaded region, the curve is the 
graph of a function Y — g(X) that 
starts with quadratic terms. 

The Greek letter n is “kappa.” 
We could avoid the absolute value 
by defining the signed curvature of 
an oriented curve, but we won’t do 
so here, to avoid complications. 

When X — 0, both g{X) and 
p'(0) vanish, while <7"(0) = —1; 
the quadratic term for the Taylor 
polynomial is \g " . 


In these X, Y coordinates, the curve is locally the graph of a function Y — 
p(X), which can be approximated by its Taylor polynomial. This Taylor poly- 
nomial contains only quadratic and higher terms 23 : 

Y = g(X) = ^X 2 + ^ X 3 + ..., 3.8.1 

where A 2 is the second derivative of g (see Equation 3.3.1). All the coefficients 
of this polynomial are invariants of the curve: numbers associated to a point 
of the curve that do not change if you translate or rotate the curve. 

The curvature of plane curves 

The coefficient that will interest us is A 2 , the second derivative of g. 

Definition 3.8.1 (Curvature of a curve in K 2 ). Let a curve in M 2 be 
locally the graph of a function g(X), with Taylor polynomial 

}(Jf) = j^ + jX 3 + .... 

Then the curvature k of the curve at 0 is |A 2 |. 

The curvature is normalized so that the unit circle has curvature 1. Indeed, 
near the point ( J ^ , the “best coordinates” for the unit circle are X = x, Y = 

y - 1 , so the equation of the circle y — Vl - x 2 becomes 

g (X) = Y = y- 1 = y/l - X 2 - 1 3.8.2 

with the Taylor polynomial 24 

g(x) = -\x 2 + . . ., 3.8.3 

the dots representing higher degree terms. So the unit circle has curvature 

1-11 = 1 . 

Proposition 3.8.2 tells how to compute the curvature of a smooth plane curve 
that is locally the graph of the function f(x). Note that when we use small 
letters, x and y , we are using the standard coordinate system. 

23 The point a has coordinates X = 0, Y — 0, so the constant term is 0; the linear 
term is 0 because the curve is tangent to the X-axis at a. 

24 We avoided computing the derivatives for g(X ) by using the formula for the 
Taylor series of a binomial (Equation 3.4.7): 

(l + ar = l + ma + ^i)^ + ^ m -^ m - 2 ) o 3 + 

In this case, m is 1/2 and a = -X 2 . 


• • « • 
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Proposition 3.8.2 (Computing the curvature of a plane curve known 
as a graph). The curvature k of the curve y = }(x) at ( ) is 


l/»l 

(l + /'(<*) 2 ) 3/2 ' 


3.8.4 


The rotation matrix of Exam- 
ple 1.3.17: 

cos 0 — sin 0 

sin 0 cos 0 


Proof. We express f(x) as its Taylor polynomial, ignoring the constant term, 
since we can eliminate it by translating the coordinates, without changing any 
of the derivatives. This gives us 

f(x) = f'(o)x + — ^ x 2 4- — 3.8.5 

Now rotate the coordinates by 0 , using the rotation matrix 

cos0 sinfll 386 

— sin 0 cos 0 


is the inverse of the one we are 
using now; there we were rotating 
points, while here we are rotating 
coordinates. 


Recall (Definition 3.8.1) that 
curvature is defined for a curve lo- 
cally the graph of a function g{X) 
whose Taylor polynomial starts 
with quadratic terms. 


Then 

X = x cos 0 + y sin 0 Xcos0 - Y sin0 = x(cos 2 0 4* sin 2 0) = x 

Y = -xsin0 4-i/cos0 ^ ^ X sin 0 4- Y cos0 = y(cos 2 0 4- sin 2 0) = y. 

3.8.7 

Substituting these into Equation 3.8.5 leads to 

X sin0 4- Y cos0 = /'(a) (X cos0 — Y sin0) -f ^ j'— (X cos0 - Y sin0) 2 4* — 

N V — ' V — ■■■ V * L s ■ — V ** 

y x x 2 

3.8.8 

We want to choose 0 so that this equation expresses Y as a function of X, with 
derivative 0, so that its Taylor polynomial starts with the quadratic term: 

Y = g(X) = ^X 2 +.... 3.8.9 

If we subtract X sin 0 4- Y cos 0 from both sides of Equation 3.8.8, we can write 
the equation for the curve in terms of the X, Y coordinates: 


Alternatively, we could say that 
X is a function of Y if D\F is 
invertible. 

Here, 

D 2 F — ~/'(a)sin0 - cos 6 

corresponds to Equation 2.9.21 in 
the implicit function theorem; it 
represents the “pivotal columns” 
of the derivative of F. Since that 
derivative is a line matrix, D 2 F 
is a number, heing nonzero and 
being invertible are the same. 


f(y) = 0 = -Xsin0-rcos0 + f'(a)(XcosS - Y sintf) + . .. , 3.8.10 

with derivative 

[df(q)| = [/'(a) cos0 - sin0, -/'(a)sin0 - cos0]. 3.8.11 

D\F D 2 F 

The implicit function theorem says that Y is a function g(X) if D 2 F is in- 
vertible, i.e., if —f'(a) sin0 - cos0 ^ 0. In that case, Equation 2.9.25 for the 
derivative of an implicit function tells us that in order to have ^'(O) = 0 (so that 
g(X) starts with quadratic terms) we must have D\F ~ f'(a) cos0 — sin0 = 0, 
i.e., tan# = /'(a): 
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Setting tan0 = f'(a) is simply 
saying that /'(a) is the slope of the 
curve. 



g'(0) = 0 = -(O^O)]- 1 (££F(0)] 3.8.12 

^0 must be 0 

If we make this choice of 0, then indeed 

—/'(a) sin 0 — cos 0= — ~ ^ 0, 3.8.13 

cos 0 

so the implicit function theorem does apply. We can replace Y in Equation 
3.8.10 by g(X): 

F(X) = 0 = -X sin 5 - g(X)cos6 + f(a)(X cos$ - p(X)sinS) 

+ ^Y^( x cosff ~ S(X) sin 0) 2 + ..., 3.8.14 

v v ' 

additional term; see Eq.3.8.8 


Figure 3.8.2. 

This right triangle justifies 
Equation 3.8.18. 


Since p'(0) = 0, g(X) starts 
with quadratic terms. Moreover, 
by Theorem 3.4.7, the function g 
is as differentiable as F, hence as 
/. So the term Xg(X) is of degree 

3, and the term g(X) 2 is of degree 

4. 


If we group the linear terms in g(X) on the left, and put the linear terms in X 
on the right, we get 

= o 


(/'(a)sin0 + cos0)g(X) = (f'(a) cos0 - sin0)X 

+ (cos OX - sin 6g(X)f + ... 

= ^-^(cosWT-sin eg(X)f + .... 

£ 

We divide by /'(a) sin0 + cos0 to obtain 

g(x) = . l — — - ^ ( cos 2 ox 2 

f'(a) sin 0 + cos 0 2 V 

- 2 cos 0 sin 6Xg(X) + sin 2 0(g(X) 2 } + 

these are of degree 3 or higher 

Now express the coefficient of X 2 as A 2 /2, getting 

A = /"(a) cos 2 0 

2 /'(a)sin0 + cos0' 

Since f'(a) = tan 0, we have the right triangle of Figure 3.8.2, and 

/'(a) . „ 1 


sin0 = 


and cos0 = 


+ (/'(«))* ’’ ’ V 1 + (f'(a)j 

Substituting these values in Equation 3.8.17 we have 

|/"(a)| 


3.8.15 


3.8.16 


3.8.17 


3.8.18 


A 2 = 


7 — 2 ,3/2 > sothat « = = . D 

(l + (/'(«)) ) (1 + /'(a) 2 ) 372 


3.8.19 



There is no reasonable gener- 
alization of this approach to sur- 
faces, which do have intrinsic ge- 
ometry. 


The vector V is the velocity 
vector of the parametrization 7. 


If the odometer says you have 
traveled 50 miles, then you have 
traveled 50 miles on your curve. 

Computing the integral in 
Equation 3.8.22 is painful, and 
computing the inverse function 
t{s) is even more so, so parainetri- 
zation by arc length is more at- 
tractive in theory than in practice. 
Later we will see how to compute 
the curvature of curves know'n by 
arbitrary parametrizations. 

Proposition 3.8.4 follows from 
Proposition 3.8.13. 
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Geometry of curves parametrized by arc length 

There is an alternative approach to the geometry of curves, both in the plane 
and in space: parametrization by arc length. The existence of this method 
reflects the fact t hat curves have no interesting intrinsic geometry: if you were 
a one-dimensional bug living 011 a curve, you could not make any measurements 
that would tell whether your universe was a straight line, or all tangled up. 

Recall (Definition 3.1.20) that a parametrized curve is a mapping 7: / — > R n , 
where / is an interval in ift. You can think of / as an interval of time; if you are 
traveling along the curve, the parametrization tells you where you are on the 
curve at a given time. 

Definition 3.8.3 (Arc length). The arc length of the segment 7([a, 6]) of 
a curve parametrized by gamma is given by the integral 

\l'(t)\dt. 3.8.20 



A more intuitive definition to consider is the lengths of straight line segments 
( “inscribed polygonal curves” ) joining points 7(M, 7(*i )> • • • >7 (* m ), where t {) = 
a and t m = 6, as shown in Figure 3.8.3. Then take the limit as the line segments 
become shorter and shorter. 

In formulas, this means to consider 

m- 1 

Y h(^+i) - T(*»)|, which is almost Y l7'(*,)l(*»4-i - U). 3.8.21 

*-0 i=0 

(If you have any doubts about the “which is almost,” Exercise 3.8.2 should 
remove them when 7 is twice continuously differentiable.) This last expression 
is a Riemann sum for J Q |y(£)| dt. 

If you select an origin 7(f 0 ), then you can define s(t) by the formula 

= J IYMI du ; 3.8.22 

odometer Kpeedometer 

reading reading 

at time * at time u 

${t) gives the odometer reading as a function of time: “how far have you gone 
since time to ”). It is a monotonically increasing function, so (Theorem 2.9.2) 
it has an inverse function t(s) (at what time had you gone distance s on the 
curve?) Composing this function with 7 : / — > R 2 or 7 : / -» K 3 now says 
where you are in the plane, or in space, when you have gone a distance s along 
the curve (or, if 7 : / - R», where you are in R n ). The curve 

<*(s) = 7(*W) 3.8.23 

is now parametrized by arc length: distances along the curve are exactly the 
same as they are in the parameter domain wdiere s lives. 
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Figure 3.8.3. 


A curve approximated by an in- 
scribed polygon. While you may 
be more familiar with closed poly- 
gons, such as the hexagon and 
pentagon, a polygon does not need 
to be closed. 

In Equation 3.8.25, the first in- 
dex for the coefficient A refers to 
X and the second to Y , so A i , i is 
the coefficient for XV, and so on. 



Figure 3.8.4. 


In an adapted coordinate sys- 
tem, a surface is represented as 
the graph of a function from the 
tangent plane to the normal line. 
In those coordinates, the function 
starts with quadratic terms. 


Proposition 3.8.4 (Curvature of a plane curve parametrized by arc 
length). The curvature k of a plane curve S(s) parametrized by arc length 
is given by the formula 


k(S(s)) = \6"(s)\. 


3.8.24 


The best coordinates for surfaces 


Let S be a surface in K 3 , and let a be a point in S. Then an adapted coordinate 
system for 5 at a is a system where X and Y are coordinates with respect to an 
orthonormal basis of the tangent plane, and the Z-axis is the normal direction, 
as shown in Figure 3.8.4. In such a coordinate system, the surface S is locally 
the graph of a function 


Z = f ) = ^(A 2 ,oX 2 + 2Ai ti XY 4 - A 0>2 Y 2 ) + higher degree terms. 

s — > " 

quadratic term of Taylor polynomial 


3.8.25 

Many interesting things can be read off from the numbers >12,0 > ^ 1,1 &nd 
Ao, 2 : in particular, the mean curvature and the Gaussian curvature , both gen- 
eralizations of the single curvature of smooth curves. 


Definition 3.8.5 (Mean curvature of a surface). The mean curvature 
H of a surface at a point a is 

H = 2^2,0 + Aqj). 

The mean curvature measures how far a surface is from being minimal. A 
minimal surface is one that locally minimizes surface area among surfaces with 
the same boundary. 


Definition 3.8.6 (Gaussian curvature of a surface). The Gaussian 
curvature K of a surface at a point a is 

K — v42,o>io,2 — A 2 tl . 3.8.26 


The Gaussian curvature measures how big or small a surface is compared to 
a flat surface. The precise statement, which we will not prove in this book, is 
that the area of the disk D r {x) of radius r around a point x of a surface has 
the 4th degree Taylor polynomial 

Area(D r (x)) % nr 2 - ^ r 4 . 3.8.27 

v 12 

area of curved disk fl"tdisk 
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Sewing is something of a dying 
art, but the mathematician Bill 
Thurston, whose geometric vision 
is legendary, maintains that it is 
an excellent way to acquire some 
feeling for the geometry of sur- 
faces. 


If the curvature is positive, the curved disk is smaller than a flat disk, and 
if the curvature is negative, it is larger. The disks have to be measured with 
a tape measure contained in the surface; in other words, D r (x) is the set of 
points which can be connected to x by a curve contained in the surface and of 
length at most r. 

An obvious example of a surface with positive Gaussian curvature is the 
surface of a ball. Take a basketball and wrap a napkin around it; you will have 
extra fabric that won’t lie smooth. This is why maps of the earth always distort 
areas: the extra ‘‘fabric” won’t lie smooth otherwise. 

An example of a surface with negative Gaussian curvature is a mountain 
pass. Another example is an armpit. If you have ever sewed a set-in sleeve on a 
shirt or dress, you know that when you pin the under part of the sleeve to the 
main part of the garment, you have extra fabric that doesn’t lie flat; sewing the 
two parts together without puckers or gathers is tricky, and involves distorting 
the fabric. 


The Gaussian curvature is the 
prototype of all the really interest- 
ing things in differential geometry. 
It measures to what extent pieces 
of a surface can be made flat, with- 
out stretching or deformation — as 
is possible for a cone or cylinder 
but not for a sphere. 





FIGURE 3.8.5. Did you ever wonder why the three Billy Goats Gruff were the sizes 
they were? The answer is Gaussian curvature. The first goat gets just the right 
amount of grass to eat; he lives on a flat surface, with Gaussian curvature zero. The 
second goat is thin. He lives on the top of a hill, with positive Gaussian curvature. 
Since the chain is heavy, and lies on the surface, he can reach less grass. The third 
goat is fat. His surface has negative Gaussian curvature; with the same length chain, 
he can get at more grass. 
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Computing curvature of surfaces 

Proposition 3.8.2 tells how to compute the curvature of a plane curve known 

as a graph. The analog for surfaces is a pretty frightful computation. Suppose 
we have a surface 5, given as the graph of a function / ^ j , of which we have 
written the Taylor polynomial to degree 2: 

z ~ f (y) =a>\x + a 2 y+ ^(a 2 , 0 x 2 + 2a hi xy + ao, 2 y 2 ) + . . . . 3.8.28 

(There is no constant term because we translate the surface so that the point 
we are interested in is the origin.) 

A coordinate system adapted to S at the origin is the following system, where 
we set c = yj d\ + a\ to lighten the notation: 



y/l +C 2 




That is, the new coordinates are taken with respect to the three basis vectors 

r Q i 1 r ai - 

c \/ 1 + c 2 %/1 + c 2 

o>2 a 2 

’ cv^l + c 2 ’ y/l + c 2 3.8.30 

c -1 

- v/F-f C 2 J y/\ + C 2 . 

The first vector is a horizontal unit vector in the tangent plane. The second is 
a unit vector orthogonal to the first, in the tangent plane. The third is a unit 
vector orthogonal to the previous two. It takes a bit of geometry to find them, 

but the proof of Proposition 3.8.7 will show that these coordinates are indeed 
adapted to the surface. 
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Remember that we set 
c = yja'\+a 2 2 . 


Note that Equations 3.8.33 and 
3.8.34 are somehow related to 
Equation 3.8.4: in each case the 
numerator contains second deriva- 
tives ( 422 . 0 , q »,2 i etc., are coeffi- 
cients for the second degree terms 
of the Taylor polynomial) and the 
denominator contains something 
like 1 4- \Df\ 2 (the ai and a? of 
c = y/a 2 4- a 2 are coefficients of 
the first degree term). A more 
precise relation can be seen if you 
consider the surface of equation 
z — /(*)> V arbitrary, and the 
plane curve, z = f{x). In that case 
the mean curvature of the surface 
is half the curvature of the plane 
curve. Exercise 3.8.3 asks you to 
check this. 

We prove Proposition 3.8.7 af- 
ter giving a few examples. 


Proposition 3.8.7 (Computing curvature of surfaces), (a) Let S be 
the surface of Equation 3.8.28 , and X,Y,Z be the coordinates with respect 
to the orthonormal basis given by Equation 3.8.30 . With respect to these 
coordinates, S is the graph of Z as a function F of X and Y : 

F (§) = Ua 2 ,oX 2 +2A\, x XY + A 0 , 2 Y 2 ) + . . . , 3.8.31 


which starts with quadratic terms. 

(b) Setting c = \Ja\ 4- a\, the coefficients for the quadratic terms of F are 

^ 2.0 = - (<* 2 , 0^2 ” 2fll, 10102 + Oo, 2 «i) 

C 2 vl + c l 


Ai,l = ^2(1^2) ( aifl2 ( a2 ’° “ °°. 2 ) + a u( a 2 ” a l)) 

A), 2 = + c 2)3 72 (o2,o a i + 2 a u aia 2 + <*0,202) 


3.8.32 


(c) The Gaussian curvature of S is 




O2,0<*0,2 “* a l,l 


(1 + c 2 ) 


2\2 » 


and the mean curvature is 


H = 2(l~+ c 2 ) 3 / 2 ( a2,0 ^ + °2) “ 2a i fl 20 i,i + 00 , 2(1 + a?)). 


3.8.33 

3.8.34 


Example 3.8.8 (Computing the Gaussian and mean curvature of a 
surface). Suppose we want to measure the Gaussian curvature at a point 

^ ^ of the surface given by the equation z — x 2 — y 2 (the saddle shown in 
Figure 3.6.1). We make that point our new origin; i.e., we use new translated 
coordinates, u, v , w, where 


x — a 4- u 

y = b 4- v 3.8.35 

z = a 2 - b 2 4- w. 


(The u-axis replaces the original x-axis, the v-axis replaces the y-axis, and the 
w-axis replaces the z-axis.) Now we rewrite the equation z = x 2 — y 2 as 



V. 



z 


= (a 4- u) : 



— o^ 4~ 2au 4- n 2 - 6 2 — 26t r — ?; 2 , 


3.8.36 
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Remember that we set 



The first two rows of the right- 
hand side of Equation 3-8-40 are 
the rotation matrix we already 
saw in Equation 3-8.6. The map- 
ping simultaneously rotates by a 
in the (x, y)-plane and lowers by a 
in the z direction. 



-f- 

Figure 3.8.6. 


The helicoid is swept out by a 
horizontal line, which rotates as it 
is lifted. 


which gives 
w 


= 2 au - 2 bv + u 2 - v 2 = > 2a^u + ^2bv + + ^2^ 2 ). 3.8.37 

tt 2 .» 00,2 


Ol 


02 


Now we have an equation of the form of Equation 3.8.28, and we can read off 
the Gaussian curvature, using the values we have found for 01 , 02 , 02.0 and 00 , 2 : 


02 . 000 . 2 - 41,1 


K = 


(2 • - 2 ) - 0 ^4 


(1 + 4a 2 + 46 2 ) 2 (1 + 4a 2 + 46 2 ) 2 * 


3.8.38 


(1+c 2 ) 2 


Looking at this formula for K, what can you say about the surface away from 
the origin? 25 

Similarly, we can compute the mean curvature: 

4(6 2 - o 2 ) * 


H = 


(1 + 4a 2 -I- 46 2 ) 3 / 2 


3.8.39 


Example 3.8.9 (Computing the Gaussian and mean curvature of the 
helicoid). The helicoid is the surface of equation ycosz = xsinz. You can 
imagine it as swept out by a horizontal line going through the 2 -axis, and which 
turns steadily as the z-coordinate changes, making an angle z with the parallel 
to the x-axis through the same point, as shown in Figure 3.8.6. 

A first thing to observe is that the mapping 



xcosa + j/sina 
-xsina + j/cosa 
z - a 


3.8.40 


is a rigid motion of R 3 that sends the helicoid to itself. In particular, setting 

( T \ 

a = 2 , this rigid motion sends any point to a point of the form (01, and it is 

W 

enough to compute the Gaussian curvature K(r) at such a point. 

We don’t know the helicoid as a graph, but by the implicit function theorem, 

the equation of the helicoid determines 2 as a function g r ( * ) near ^ 0 ^ when 

r ^ 0. What we need then is the Taylor polynomial of g r . Introduce the new 
coordinate u such that r + u = x, and write 


9r (y) = z== a 2 V + ai,iuy + ^a 0<2 y 2 + .... 


3.8.41 


25 The Gaussian curvature of this surface is always negative, but the further you 
go from the origin, the smaller it is, so the flatter the surface. 
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In rewriting 

y cos z = (r + u) sin z 

as Equation 3.8.42, we replace 
cos z by its Taylor polynomial, 

z 1 z 4 
~ 2! + 4! 

keeping only the first term. (The 
term z 2 f2\ is quadratic, but y 
times z 2 1 2! is cubic.). We replace 
sin z by its Taylor polynomial, 

2 3 2* 

sin(*) = 2 - — + — . . . , 
keeping only the first term. 


You should expect (Equation 
3.8.43) that the coefficients a-i and 
ai,i will blow up as r — ♦ 0, since 
at the origin the helicoid does not 
represent 2 as a function of x and 
y. But the helicoid is a smooth 
surface at the origin. 


Exercise 3.8.4 asks you to justify our omitting the terms and a 2 $u 2 . 

Introducing this into the equation ycosz = (r + u)sh\z and keeping only 
quadratic terms gives 

1 2 . 

y = ( r + u) (a 2 y -I- a lA uy 4- -aq <2 y ) + ••-, 

> J. ' 

z from Equation 3.S.41 

Identifying linear and quadratic terms gives 

<2j — 0 , a 2 = - , ai,j = --s , «o,2 = 0 , a 2 . 0 = 0. 
r t* 


3.8.42 


3.8.43 


We can now read off the Gaussian and mean curvatures: 


K(r) 

Gaussian 

curvature 


-1 

r 4 (l + 1/r 2 ) 2 


(1+ 7 i) 2 and =° 

mean 

curvature 


3.8.44 


We see from the first equation that the Gaussian curvature is always negative 
and does not blow lip as r — > 0: as r — ► 0, K(r) — * — 1. This is what we should 
expect, since the helicoid is a smooth surface. The second equation is more 
interesting yet. It says that the helicoid is a minimal surface: every patch of 
the helicoid minimizes area among surfaces with the same boundary. A 


Proof of Proposition 3.8.7. In the coordinates X, Y, Z (i.e., using the values 
for x, y and z given in Equation 3.8.29) the Equation 3.8.28 for S becomes 


We were glad to see that the 
linear terms in X and Y cancel, 
showing that we had indeed cho- 
sen adapted coordinates. Clearly, 
the linear terms in X do cancel. 
For the linear te rms in Y , remem- 
ber that c = \Ja 2 4 of. So the 
linear terms on the right are 

ajY i a\Y _ c 2 Y 
Cy / 1 + C 2 C\/l 4 c 2 Cy/l 4 C 2 

and on the left we have 
cY 

v/TT?' 


z from Equation 3.S.29 

- 


VTT 


:Y- 


c * vTT? 


z 


x from Equation 3.S.29 


( a 2 


x + 


ai 


. — ,r + — a -‘ -z 

C\/ 1 4- C 2 y/l -f c 2 


) 


4- fl 2 


(? 


X + 


<*2 


Y4- 


0.2 


Cy/l -4 c 2 y/l 4- c 2 


0 


^( a20 ("7 

■*(-? 
-a°, 2 (^ 


+ 


4- 2a i 


X + 


a i 


Y + -^L=z 


cv/rr? \/\ + c 2 

X + Y + 


) 


cs/TTc 


2 

0.2 


n/T+ c 


Y + 


2 

<*2 


x + 


0-2 


:y/TT 


:Y + 


0.2 


VTT? 


z ) 


cv/TTc 2 V^l + c 2 



3.8.45 


We observe that all the linear terms in X and Y cancel, sh owing t hat this is 
an adapted system. The only remaining linear term is -y/l + c 2 Z and the 
coefficient of Z is not 0, so D$F ^ 0, so the implicit function theorem applies. 
Thus in these coordinates, Equation 3.8.45 expresses Z as a function of X and 
Y which starts with quadratic terms. This proves part (a). 

To prove part (b), we need to multiply out the right-hand side. Remember 
that the linear terms in X and Y have canceled, and that we are interested 
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Since the expression of Z in 
terms of X and Y starts with qua- 
dratic terms, a term that is linear 
in Z is actually quadratic in X and 
Y. 


only in terms up to degree 2; the terms in Z in the quadratic terms on the right 
now contribute terms of degree at least 3, so we can ignore them. We can thus 
rewrite Equation 3.8.45 as 


-s/l +c 2 2 = 


^( a2 ° (“7 

2a,., (-S 

00,2 ( t ; 


X + 


X + 


a i 


cvT 4 c 2 
Qi 

cV 1 4- c 2 


a 2 


2 

Y ) 4 


) 

•)(? 


X4 


<*2 


cv^l 4* c 2 


•) 


n + 


C\/l 4 c 2 



4* . . . 


3.8.46 


Equation 3.8.47 says 
Z = ^(A 2 .oX 2 + 2A,, l XY 
+ 4q ,tY 2 ) -f 


If we multiply out, collect terms, and divide by — VT+~?, this becomes 


^ fl 2,oo? — 2ai ,iaia2 4 ao, 2 aJ J X 2 

1 

4 c 2 ) 

(~ ~ 2 ( 1 + 1 c 2 ) 3/2 Q2 '° Q 1 + 2ai,iOi«2 +ao. 2 oijr 2 j +... 3.8.47 

This proves part (b). 

To see part (c), we just compute the Gaussian curvature, K = A 2 ,oA 0>2 -Aj i : 


02,0^1^2 — 01,1^2 + Ol.lG? 4 00,2^102 ^ 


XY 




This involves some quite mirac- 
ulous cancellations. The mean 
curvature computation is similar, 
and left as Exercise 3.8.10; it also 
involves some miraculous cancella- 
tions. 


Knot theory is a very active 
field of research today, with re- 
markable connections to physics 
(especially the latest darling of 
theoretical physicists: string the- 
ory). 


>*2,0>lo,2 - Ai 1 = 
c*(l + c 2 ) 2 0 a2 ’ 


n 2 
oa 2 


2ai ( iaia 2 4 00 , 2 a?) (a 2 ,oa? 4 2ai,iai02 4 00 , 2 ^ 2 ) 
(o 2) oaia 2 4 01,102 - ai.ia? - ao l2 aia 2 ) 2 ^ 


<*2,000,2 - Q?,i 

(14 c 2 ) 2 


• □ 3.8.48 


Coordinates adapted to space curves 

Curves in R 3 have considerably simpler local geometry than do surfaces: essen- 
tially everything about them is in Propositions 3.8.12 and 3.8.13 below. Their 

global geometry is quite a different matter: they can tangle, knot, link, etc. in 
the most fantastic ways. 

Suppose C C R 3 is a smooth curve, and a 6 C is a point. What new 
coordinate system X, Y, Z is well adapted to C at a? Of course, we will take 
the origin of the new system at a, and if we demand that the X-axis be tangent 
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to C at a, and call the other two coordinates U and V, then near a the curve 
C will have an equation of the form 


U = f(X) = -« 2 X 2 + X'* + ■ • 

2 O 


V = g(X) = '-biX 2 + '-b^X 3 + 


3.8.49 


where both coordinates start with quadratic terms. But it turns out that we can 
do better, at least when C is at least three times differentiable, and a| + b 2 ^ A- 

Suppose we rotate the coordinate system around the X-axis by an angle # , and 

We again use the rotation ma- call X. Y, Z the new (final) coordinates. Let c = cos#, s = sin#: this means 
true of Equation 3.8.6: setting 


cos 9 sin 
sin 9 cos 


in# 

os# 


Remember that s = sin# and 
c = cos #, so c 2 4- s 2 — 1 . 


U = cY + sZ and V = —sY 4- c.Z. 

Substituting these expressions into Equation 3.8.49 leads to 

cY 4- sZ = —& 2 X 2 -f- — ( 23 X 3 4 - . . . 

2 o 

—sY 4- c Z — —b 2 X 2 4- -6 3 X 3 4- . . . . 

2 6 


3.8.50 


3.8.51 


We solve these equations for Y by multiplying the first through by c and the 
second by -s and adding the results: 


A 2 = \/«2 + b 2 ' 


#3 = 


4 - CL'ibd 

\A*2 4- 


Y(c 2 4 - s 2 ) = r = -(ca 2 - sh 2 )X 2 4 - -(ca 3 - s# 3 )X 3 . 


3.8.52 


A similar computation gives 


z - ^( Sfl 2 + cb 2 )x 2 4- g(sa 3 + ch 3 )X 3 . 


3.8.53 


The point of all this is that we want to choose the angle # (the angle by which 
we rotate the coordinate system around the X-axis) so that the Z-component 
of the curve begins with cubic terms. We achieve this by setting 


c = cos # = 


this gives 


= and s = sin# = -■ so that tan# = — 

'2 v a 2 + °2 0-2 

3.8.54 


Y — -\/fln4 


+ a 2^ + b^ xl= A lx 2 + A lx ^ + 

§yja 2 4 6 2 2 6 


Z = 2^03 + 0263 3 m B , 3 


X J + .... 


3.8.55 



3.8 Geometry of Curves and Surfaces 329 


The word osculating comes 
from the Latin oscularL “to kiss.” 


Note that the torsion is defined 
only when the curvature is not 
zero. The osculating plane is the 
plane that the curve is most nearly 
in, and the torsion measures how 
fast the curve pulls away from it. 
It measures the “non-planarity” of 
the c urve. 


A curve in R n can be parame- 
trized by arc length because curves 
have no intrinsic geometry; you 
could represent the Amazon River 
as a straight line without distort- 
ing its length. Surfaces and other 
manifolds of higher dimension can- 
not be parametrized by anything 
analogous to arc length; any at- 
tempt to represent the surface of 
the globe as a flat map necessar- 
ily distorts sizes and shapes of the 
continents. Gaussian curvature is 
the obstruction. 


Imagine that you are driving in 
the dark, and that the first unit 
vector is the shaft of light pro- 
duced by your headlights. 

We know the acceleration must 
be ort hogonal to the curve because 
your speed is constant; there is no 
component of acceleration in the 
direction you are going. 

Alternatively, you can derive 
26 ' ■ 6" = 0 
from |<5'| 2 - 1. 


The Z-cornponent measures the distance of the curve from the ( X , K)-plane; 
since Z is small, then the curve stays mainly in that plane. The ( X , y)-plane 
is called the osculating plane to C at a. 

This is our best adapted coordinate system for the curve at a, which exists 
and is unique unless <22 = &2 = 0. The number k = A2 > 0 is called the 
curvature of C at a, and the number r = B3/A2 is called the torsion of C at a. 

Definition 3 . 8.10 (Curvature of a space curve). The curvature of a 

space curve C at a is 

k = A 2 > 0 . 

Definition 3 . 8.11 (Tbrsion of a space curve). The torsion of a space 

curve C at a is 

r = B3/A2. 


Parametrization of space curves by arc length: the Prenet frame 


Usually, the geometry of space curves is developed using parametrizations by 
arc length rather than by adapted coordinates. Above, we emphasized adapted 
coordinates because they generalize to manifolds of higher dimension, while 
parametrizations by arc length do not. 

The main ingredient of the approach using parametrization by arc length 
is the Frenet frame. Imagine driving at unit speed along the curve, perhaps 
by turning on cruise control. Then (at least if the curve is really curvy, not 
straight) at each instant you have a distinguished basis of R 3 . The first unit 
vector is the velocity vector, pointing in the direction of the curve. The second 
vector is the acceleration vector, normalized to have length 1 . It is orthogonal 
to the curve, and points in the direction in which the force is being applied — 
i.e., in the opposite direction of the centrifugal force you feel. The third basis 
vector is the binormal , orthogonal to the other two vectors. 

So, if 6 : R —» R 3 is the parametrization by arc length of the curve, the three 
vectors are: 


fi(s) = lwi = ^wi’ 3 - 8 - 56 

s v ■ — / bi normal 

normalized 
acceleration vector 

The propositions below relate the Prenet frame to the adapted coordinates; 
they provide another description of curvature and torsion, and show that the 
two approaches coincide. The same computations prove both; they are proved 
in Appendix A. 11. 


t(«) = <T'(s) , 

velocity vector 
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Proposition 3.8. 12 (Frenet frame). The point with coordinates X,Y,Z 
(as in Equation 3.8.55) is the point 

a + Xt(0) + Yi 5(0) + Zb(0). 3.8.57 

Equivalently, the vectors t(0), n(0), b(0) form the orthonormal basis (Frenet 
frame) with respect to which our adapted coordinates are computed. 


Equation 3.8.58 corresponds to 
the antisymmetric matrix 

' 0 k 0' 

-K Or. 

0 — r 0 


Proposition 3.8.13 (Curvature and torsion of a space curve). The 

Frenet frame satisfies the following equations, where k is the curvature of the 
curve at a and r is its torsion: 

t'(0) = «n(0) 

n'(0) = -*t(0) + rb(0) 3.8.58 

b'(0) = - rn(0). 


Exercise 3.8.9 asks you to ex- 
plain where this antisymmetry 
comes from. 


Propositions 3.8.14 and 3.8.15 
make the computation of curva- 
ture and torsion straightforward 
for any parametrized curve in IR 3 . 


Computing curvature and torsion of parametrized curves 

We now have two equations that in principle should allow us to compute curva- 
ture and torsion of a space curve: Equations 3.8.55 and 3.8.58. Unfortunately, 
these equations are hard to use. Equation 3.8.55 requires knowing an adapted 
coordinate system, which leads to very cumbersome formulas, whereas Equa- 
tion 3.8.58 requires a parametrization by arc length. Such a parametrization 
is only known as the inverse of a function which is itself an indefinite integral 
that can rarely be computed in closed form. However, the FYenet formulas can 
be adapted to any parametrized curve: 


Proposition 3.8.14 (Curvature of a parametrized curve). The cur- 
vature k of a curve parametrized by 7 : R 3 R is 


K (t) = 

IY(t)l° 


3.8.59 


Proposition 3.8.15 (Tbrsion of a parametrized curve). The torsion t 
of a parametrized curve is 

. _ (Y(t) x y'(t)) • Y"(t) (j'(<)) a 

1 ; MO ) 6 | Y(t) x y'(t )| 2 

_ (-?(«) x Y'(0) ■ Y"{t) 

IV(0 x Y'(t)\ 2 


3.8.60 
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Example 3.8.16 (Computing curvature and torsion of a parametrized 


t 

curve). Let 7(f) = I t 2 I . Then 

f 3 


0 


0 


Y(t) = 2 1 , Y(t) = 2 , Y"(t) = 0 


zv 


So we find 


(l+4f 2 +9f 4 ) 3 / 2 


6t 


0 


2t x 2 


Zt d 


6t 


= 2 


(1 +9t 2 + 9t 4 )V 2 
(l + 4f 2 + 9f 4 ) 3 / 2 


and 


r(t) = 


4(1 + 9£ 2 4* 9f 4 ) 


6 1 2 
-6t 
2 


0 

0 I = 


1 + 9£ 2 -f 9 1 4 ' 


3.8.61 


3.8.62 


3.8.63 


Since 7(f) = 



and z = X 3 , so Equation 3.8.55 
says that Y = X 2 = \X 2 , so 

A 2 = 2 . . . . 


To go from the first to the sec- 
ond line of Equation 3.8.66 we 
use Proposition 3.8.13, which says 
that t' = Kn. 


Note in the second line of Equa- 
tion 3.8.66 that we are adding vec- 
tors to get a vector: 


K(s(t))(s{t)f n(s(t)) 

number vec. 

+ s"(t) f(s («)) 


no. vec. 


At the origin, the standard coordinates are adapted to the curve, so from Equa- 
tion 3.8.55 we find A 2 — 2, £3 = 6; hence k, — A 2 = 2 and r = B$/A 2 = 3. 
This agrees with the formulas above when t = 0. A 

Proof of Proposition 3.8.14 (curvature of a parametrized curve). We 
will assume that we have a parametrized curve 7 : R — » K 3 ; you should imag- 
ine that you are driving along some winding mountain road, and that 7 (t) is 
the position of your car at time t. Since our computation will use Equation 
3.8.58, we will also use parametrization by arc length; we will denote by <${s) 
the position of the car when the odometer is s, while 7 denotes an arbitrary 
parametrization. These are related by the formula 

7 M = ( K 5 W)> where s(t) = f \Y(u)\du, 3.8.64 

Jt 0 

and to is the time when the odometer was set to 0. The function s(t) gives you 
the odometer reading as a function of time. The unit vectors t, n and b will 
be considered as functions of s, as will the curvature k and the torsion r. 

We now use the chain rule to compute three successive derivatives of 7. In 
Equation 3.8.65, recall (Equation 3.8.56) that F = t; in the second line of 
Equation 3.8.66, recall (Equation 3.8.58) that t'(0) = Kn(0): 

(1) y(t) = (i'(s(t))s'(t) = s'(t)t(s(t)), 3.8.65 

(2) Y'(t) = ?(•(*)) (/(()) 2 + t(s(t))s"(t) 


= «(«(*)) (*'(<)) 2 n(*(«)) +*"(«)»(*(<)). 


3.8.66 
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Equation 3.8.69: by Equations 
3.8.65 and 3.8.66, 

•?n) x y'(t) 

= s'(t) t(s(t)) 

X [ K (s(())(»'(l)) 2 n(s(i)) 

Since for any vector t, t x t = 0, 
and since (Equation 3.8.56) 

t x n — b, this gives 


Y(t) x 7"(t) 

= s'(t)t(s(0) 


X 


K(*(f))(s'(t» 2 n( s (t))] 


= /c(s(t))(s'(t)) 3 b(s(()). 


(3) V"(t) = k 1 (s(t))n(s(t)) (s'(t )) 3 + 2f>(s(t))n'(s(t))(s'(t)) 3 + 

K(s(t))ri(s(t)) (s'(t)) (s"(t))t'(A)) (*'(«)) («"(«)) 
ht (s(t))(s"'(t)) 

= ((K( S (i))) 2 + S '"«))t(s(<))+ 3 - 8 ' 67 

(«'(s(t)(s'(«)) 3 + 3«( S (t))(s'(«))(s"(«)))fi(s(0)+ 

(«(*( 0M*(0))s. 

Since t has length 1, Equation 3.8.65 gives us 

s’(t) = \y(t)l 3.8.68 

which we already knew from the definition of s. Equations 3.8.65 and 3.8.66 

give 

i'(t) X 7"(t) = /c(s(0)(s'(t)) 3 b(s(t)), 3.8.69 

since t x n = b. Since b has length 1, 

|V (t) x 7 «(«)| = *(*(«)) (s'(t))\ 3.8.70 

Using Equation 3.8.68, this gives the formula for the curvature of Proposition 
3.8.14. □ 


Proof of Proposition 3.8.15 (Torsion of a parametrized curve). Since 
7' x 7" points in the direction of b, dotting it with 7'" will pick out the coefficient 
of b for 7'". This leads to 

(V (0 X V'«) • 7 T ”(t) = T(s(t)) ( K m) 2 (At)f , 3.8.71 

> v ' 

square of Equation 3 S.70 

which gives us the formula for torsion found in Proposition 3.8.15. □ 


3.9 Exercises for Chapter Three 


Exercises for Section 3.1: 3.1.1 (a) For what values of the constant c is the locus of equation sin(x+2/) = 

Curves and Surfaces c a sm ooth curve? 

(b) What is the equation for the tangent line to such a curve at a point ^ ^ ^ ? 

3.1.2 (a) For what values of c is the set of equation X r = x 2 + y 3 = c a 

smooth curve? 

(b) Give the equation of the tangent line at a point ( ^ ) of such a curve X c . 
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We strongly advocate using 
Matlab or similar software. 


Hint for Exercise 3.1.7 (a): This 
does not require the implicit func- 
tion theorem. 


(c) Sketch this curve for a representative sample of values of c. 

2 3 4 

3.1.3 (a) For what values of c is the set of equation Y c = x +y +z =ca 

smooth surface? 

(b) Give the equation of the tangent plane at a point ^ v ^ of the surface 

Yc- 


w 


(c) Sketch this surface for a representative sample of values of c. 

3.1.4 Show that every straight line in the plane is a smooth curve. 

3.1.5 In Example 3.1.15, show that S 2 is a smooth surface, using D XiU , D X%1 
and D y . x ; the half-axes R+, R+ and R+; and the mappings 

± >Jx 2 -f y 2 - 1. ±y/x 2 + z 2 - 1 and ± \Jy 2 + z 2 - 1. 


3.1.6 (a) Show that the set { ( y ) € | x + z 2 + y 2 = 2 } is a smooth 

curve. 

(b) What is an equation for the tangent line to this curve at a point (J)? 

3.1.7 (a) Show that for all a and 6. the sets X a and Y b of equation 

x 2 + y 3 -I- z = a and x + y + z = b 

respectively are smooth surfaces in R 3 . 

(b) For what values of a and b is the intersection X a 0 Y b a smooth curve? 
What geometric relation is there between X a and Y b for the other values of a 
and 6? 

3.1.8 (a) For what values of a and b are the sets X a and Y b of equation 

x -y 2 — a and x 2 + y 2 + z 2 = b 
respectively smooth surfaces in R 3 ? 

(b) For what values of a and b is the intersection X u D Y b a smooth curve? 
What geometric relation is there between X a and Y b for the other values of a 
and 6? 

3.1.9 Show that if at a particular point xo a surface is simultaneously the 
graph of z as a function of x and y, and y as a function of x and z , and x as 
a function of y and z (see Definition 3.1.13), then the corresponding equations 
for the tangent planes to the surface at xo denote the same plane. 
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You are encouraged to use a 
computer, although it is not ab- 
solutely necessary. 


3.1.10 For each of the following functions / ) and points ( a ) State 

whether there is a tangent plane to the graph of / at the point 



(b) If there is, find its equation, and compute the intersection of the tangent 
plane with the graph. 

at the point ^ ^ 
at the point j 

(c ) / ( y ) = \Jx 2 + y 2 at the point ( „ } ) 


(a) =x 2 -y 2 

(b) /(£) = 


(d) f (y) = cos(* 2 + y) at the point f q) 


3.1.11 


Find quadratic polynomials p and q for which 
F (y ) = x * + y A + x2 ~ V 2 of Example 3.1.11 

F(£)=p(x) 2 + 9 (5,) 2 -i 


the function 
can be written 


Hint for Exercise 3.1.12, 
part (b): write that x - 7 (t) is 
a multiple of y'(t), which leads to 
two equations in x, y, z and t. Now 
eliminate t among these equations; 
it takes a bit of fiddling with the 
algebra. 

Hint for part (c): show that 
the only common zeroes of / and 
[ Df ] are the points of C; again this 
requires a bit of fiddling with the 
algebra. 


Sketch the graphs of p, q,p 2 and q 2 , and describe the connection between your 
graph and Figure 3.1.8. 

3.1.12 Let C c IR 3 be the curve parametrized by 7(f) = 

the union of all the lines tangent to C. 

(a) Find a parametrization of X. 

(b) Find an equation /(x) = 0 for X. 

(c) Show that X - C is a smooth surface. 

(d) Find the equation of the curve which is the intersection of X with the 
plane x = 0. 


( t 2 ] . Let X be 

W 


Part (b): A parametrization of 
this curve is not too hard to find, 
but a computer will certainly help 
in describing the curve. 


( cost \ 
sint I . 

(a) Find a parametrization for the union X of all the tangent lines to C. Use 
a computer program to visualize this surface. 

(b) What is the intersection of X with the (x, z)-plane? 

(c) Show that X contains infinitely many curves of double points, where X 
intersects itself; these curves are helicoids on cylinders x 2 + y 2 = r 2 . Find an 
equation for the numbers and use Newton’s method to compute 77, r 2 , r y . 
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3.1.14 (a) What is the equation of the plane containing the point a perpen- 

dicular to the vector v? 

(b) Let 7(f) = [ t 2 ) . and P t be the plane through the point 7(0 and 

\t 3 ) 

perpendicular to 7 ; (t). Wliat is the equation of Pt? 

(c) Show that if t\ ^ t 2 , the planes P tl and P i2 always intersect in a line. 
What are the equations of the line Pi fl P t ? 

(d) What is the limiting position of the line Pi fl Pi+h as h tends to 0? 


Hint: Think that sin a = 0 
if and only if a — ki r for some 
integer k. 


3.1.15 In Example 3.1.17, what does the surface of equation 


/ 



= sin(x + yz) — 0 


look like? 


3.1.16 (a) Show that the set X C R 3 of equation 

x 3 + xy 2 + yz 2 + z 3 = 4 is a smooth surface. 

(b) What is the equation of the tangent plane to X at the point 



7 


3.1.17 Let / = 0 be the equation of a curve X C IR 2 , and suppose 

[D/(£)]* 0 forall(*) e X 

(a) Find an equation for the cone CX C R 3 over X, i.e., the union of all the 


lines through the origin and a point 


(j) (») 


ex. 


(b) If X has the equation y = x 3 , what is the equation of CX ? 

(c) Show that CX - {0} is a smooth surface. 

(d) What is the equation of the tangent plane to CX at any x 6 CX? 


3.1.18 (a) Find a parametrization for the union X of the lines through the 


origin and a point of the parametrized curve t •-» 



(b) Find an equation for the closure X of X. Is X exactly X? 

(c) Show that X - {0} is a smooth surface. 


(d) Show that 
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is another parainetrization of X . In this form you should have no trouble giving 
a name to the surface X . 

(e) Relate to the set of non-invert ibie 2x2 matrices. 


3.1.19 (a) 

equation / ( 


What is the equation of the tangent plane to the surface S of 
y ^ = 0 at the point ^ h ^ € 5? 


(b) Write the equations of the tangent planes Pi , P 2 . P3 to the surface of 
equation 2 = Ax 2 + By 2 at the points pi, p 2 , P 3 with x. y-coordi nates (q), 

(q), (5)1 an d find the point q 5= P x n P2 D P 3 . 

(c) What is the volume of the tetrahedron with vertices at pi, p 2 , P 3 and q? 


*3.1.20 Suppose U C R 2 is open. Xo £ U is a point and f : U — ► R 3 is a 
differentiable mapping with Lipschitz derivative. Suppose that [Df(xo)] is 1-1. 

(a) Show that there are two basis vectors of R 3 spanning a plane E\ such 
that if P : R 3 — > E\ denotes the projection onto the plane spanned by these 
vectors, then (D(F o f)(x 0 )| is invertible. 

(b) Show that there exists a neighborhood V C E\ of P o f)(xo) and a 
mapping g : V — > R 2 such that (Pofo g)(y) — y for all y 6 V. 

(c) Let W = g{V). Show that f(W) is the graph of fog : V E 2 , where P 2 
is the line spanned by the third basis vector. Conclude that f(W) is a smooth 
surface. 


Exercises for Section 3.2: 
Manifolds 

The “unit sphere" has radius 
1; unless otherwise stated, it is 
always centered at the origin. 


3.2.1 Consider the space Xi of positions of a rod of length l in R 3 , where one 
endpoint is constrained to be on the x-axis, and the other is constrained to be 
on the unit sphere centered at the origin. 

(a) Give equations for X\ as a subset of R 4 , where the coordinates in R 4 are 
the x-coordinate of the end of the rod on the x-axis (call it f), and the three 
coordinates of the other end of the rod. 


(b) Show that near the point 
the equation of its tangent space. 



, the set Xi is a manifold, and give 


(c) Show that for / ^ 1, Xi is a manifold. 


3.2.2 Consider the space X of positions of a rod of length 2 in R 3 , where one 
endpoint is constrained to be on the sphere of equation (x — l) 2 + y 2 + z 2 = 1, 
and the other on the sphere of equation (x + l) 2 +• y 2 +• z 2 = 1. 
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< X\ > 


/ l\ 

2 /i 


1 

Z\ 


0 

X2 


- 1 

2/2 


1 

\*2 / 


V 0/ 


Point for Exercise 3.2.2, part 


(b). 


(a) Give equations for X as a subset of R 6 , where the coordinates in R b are 
the coordinates I y\ I of the end of the rod on the first sphere, and the three 

( x A 

coordinates [ .y 2 ) of the other end of the rod. 

\ -3 / 

(b) Show that near the point in R 6 shown in the margin, 

the set X is a manifold, and give the equation of its tangent space. What is 
the dimension of X near this point? 

(c) Find the two points of X near which X is not a manifold. 


When we say '‘parametrize'’ by 
0 i, 02, and the coordinates of Xi. 
we mean consider the positions of 
the linkage as being determined by 
those variables. 


3.2.3 In Example 3.2.1, show that knowing Xj and x 3 determines exactly four 
positions of the linkage if the distance from Xi to x 3 is smaller than both /i 4-/2 
and h 4- U and greater than |/| - /3I and |/ 2 - /4|- 

3.2.4 (a) Parametrize the positions of the linkage of Example 3.2.1 by the 
coordinates of xj, the polar angle 9\ of the first rod with the horizontal line 
passing through xj, and the angle 0 2 between the first and the second: four 
numbers in all. For each value of 0 2 such that 


(I 3 — I 4) 2 < l\ 4- l\ — 2/1/2 cos 02 < (/ 3 4- I 4 ) 2 * 


how many positions of the linkage are there? 

(b) What happens if either of the inequalities in Equation 3.2.4 above is an 
equality? 


Hint for Exercise 3.2.6, part 
(a): This is the space of matri- 
ces A 0 such that det.4 — 0 
Hint for Exercise 3.2.6, part (b): 
If A € M 2 ( 3, 3), then det A ^ 0. 


3.2.5 In Example 3.2.1, describe X 2 and X 3 when / 1 = I 2 4- I 3 4- I 4 . 

3.2.6 In Example 3.2.1, let M*(n, m ) be the space ofnxm matrices of rank 
k. 

(a) Show that the space Mi (2, 2) of 2 x 2 matrices of rank 1 is a manifold 
embedded in Mat (2, 2). 

(b) Show that the space M 2 ( 3,3) of 3 x 3 matrices of rank 2 is a manifold 
embedded in Mat (3,3). Show (by explicit computation) that [Ddet(A)] = 0 if 
and only if A has rank < 2. 


Recall (Definition 1.2.18) that 
a symmetric matrix is a matrix 
that is equal to its transpose. An 
antisymmetric matrix A is a ma- 
trix A such that A — -A r . 


*3.2.7 If /1 + h — Z3 4- 14 , show that X 2 is not a manifold near the position 
where all four points are aligned with x 2 and x 4 between x A and x 3 . 

*3.2.8 Let O(n) C Mat (n, n) be the set of orthogonal matrices, i.e., matrices 
whose columns form an orthonormal basis of ! n . Let 5(n,n) be the space of 
symmetric n x n matrices, and A(n, n ) be the space of antisymmetric n x n 
matrices. 
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(a) Show that A € O(n) if and only if A T A = /. 

(b) Show that if A, Be O(n), then AB e 0(n) and A~ l e 0(n). 

(c) Show that A T A - I e S{n,n). 

(d) Define F : Mat(n,n) S(n,n) to be F(A) = AA T - I, so that O(n) = 

F _I (0). Show that if A is invertible, then : Mat(n,n) — » S(n,n) is 

onto. 

(e) Show that 0(n) is a manifold embedded in Mat (n, n) and that T/0(n) = 
i4(n,n). 

*3.2.9 Let M k (n, m) be the space ofnxm matrices of rank k. 

(a) Show that M\ (n, m) is a manifold embedded in Mat (n, m) for all n, m > 
1. Hint: It is rather difficult to write equations for Afi(n,m), but it isn’t too 
hard to show that M x (n, m) is locally the graph of a mapping representing some 
variables as functions of others. For instance, suppose 



is a parametrization of the subset U\ C A/i(m, n) of those matrices whose first 
column is not 0. 

(b) Show that M\ (m,n) - U\ is a manifold embedded in What is 

its dimension? 

(c) How many parametrizations like y?i do you need to cover every point of 
Afi(m,n)? 


Exercises for Section 3.3: 
Taylor Polynomials 


3.3.1 For the function / of Example 3.3.11, show that all first and second 
partial derivatives exist everywhere, that the first partial derivatives are con- 
tinuous, and that the second partied derivatives are not. 
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3.3.2 Compute 

Di(Dif), D 2 (D 3 f ), Di(DJ), and D x (D 2 (D 3 f)) 


for the function / | y J = x 2 y + xy 2 + yz 2 . 

z 


3.3.3 Consider the function 

(s) 

- 0 - 0 - 

(a) Compute D\f and Dif . Is / of class C 1 ? 

(b) Show that all second partial derivatives of / exist everywhere. 

(c) Show that 





(d) Why doesn’t this contradict Proposition 3.3.11? 


3.3.4 TYue or false? Suppose / is a function on R 2 that satisfies Laplace’s 
equation D\f + D%f — 0. Then the function 


9 


(x\ _ ,( x/(x 2 + y 2 ) 

\y) \y/(x 2 + y 2 ) 


) 


also satisfies Laplace’s equation. 


3.3.5 If / ^ y j = ^ (x - y) for some twice continuously differentiable function 
<p : R — > R, show that D\f - D\f — 0. 


3.3.6 (a) Write out the polynomial 

5 

y; y a/x 7 , where 

m— 0 lei™ 

a (o,o,o) = 4, fl (o,i,o) = 3, fl(i, o, 2 ) = 4, a (i, 1 , 2 ) = 2, 

0(2, 2,0) = O( 3> 0,2) = 2, O^go.O) = 3, 

and all other aj = 0, for I G ZJ* for m < 5. 

(b) Use multi-exponent notation to write the polynomial 

2X2 + ^1^2 “ X1X2X3 + xj + 5x2X3. 

(c) Use multi-exponent notation to write the polynomial 

3 xix 2 - X2X3X4 + 2xjX 3 -l- x\x\ 4- X2. 
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The object of this exercise is 
to illustrate how long successive 
derivatives become. 


3.3.7 (a) Compute the derivatives of (l + f{x)) m > U P to and inc l udin S the 

fourth derivative. 

(b) Guess how many terms the fifth derivative will have. 

(c) Guess how many terms the n derivative will have. 


3.3.8 Prove Theorem 3.3.1. Hint: Compute 

f(a + h) - (/(a) 4- f'(a)h + 1- 

lim rr 

o h K 

by differentiating, k times, the top and bottom with respect to /i, and checking 
each time that the hypotheses of THopitaTs rule are satisfied. 


3.3.9 (a) Redo Example 3.3.16, finding the Taylor polynomial of degree 3. 
(b) Repeat, for degree 4. 

3.3.10 Following the format of Example 3.3.16, write the terms of the Taylor 
polynomial of degree 2, of a function / with three variables, at a. 


Exercise 3.3.13 uses Taylor’s 
theorem with remainder in one di- 
mension. Theorem A9.1, stated 
and proved in Appendix A9. 


3.3.11 Find the Taylor polynomial of degree 3 of the function 

(x\ (*/6\ 

f I y 1 = sin(a; + y + z) at the point I 7r/4 I . 

W W 3 / 

3.3.12 Find the Taylor polynomial of degree 2 of the function 

/ ( y ) = \Jx + y + xy at the point ( _3 ) ' 

3.3.13 Let f(x) = e 1 , so that /( 0) = e. Use Corollary A9.3 (a bound for the 
remainder of a Taylor polynomial in one dimension) to show that 


k 1 

e = £^+r fc + i, where \r k+l { < 
»=o l - 


(k + 1)! 


(b) Prove that e is irrational: if e = a/b for some integers a and 6, deduce from 
part (a) that 


|£!a - bm\ < 


36 , . A:! it! it! it! 

where m .s the mteger - + - + _ + 


Conclude that if k is large enough, then k\a- bm is an integer that is arbitrarily 
small, and therefore 0. 

(c) Finally, observe that k does not divide m evenly, since it does divide 
every summand but the last one. Since k may be freely chosen, provided only 
that it is sufficiently large, take A: to be a prime number larger than 6. Then 
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in k\a = bm we have that k divides the left side, but does not divide m. What 
conclusion do you reach? 


3.3.14 Let / be the function 

f(y) =s s n (y) 

where sgn(y) is the sign of y, i.e., +1 when y > 0, 0 when y = 0 and —1 when 
y < 0. 

Note: Part (a) is almost obvi- (a) Show that / is continuously differentiable on the complement of the half- 
ous except when y = 0,x > 0, line y = 0,x < 0. 

where y changes sign. It may help f 0 1 

to show that this mapping can be (b) Show that if a — (Z.I ) and h = , then although both a and a + h 

written (r, 9) v /rsin(0/2) in po- . . \ 

lar coordinates. are * n domain of definition of /, Taylor’s theorem with remainder (Theorem 

A9.5) is not true. 

(c) What part of the statement is violated? Where does the proof fail? 

3.3.15 Show that if / 6 2™, then (xh) / = x m h / . 

*3.3.16 A homogeneous polynomial in two variables of degree four is an 
expression of the form 

p(x, y) = ax 4 + bx 3 y + cx 2 y 2 + dxy 3 + ey 4 . 



A homogeneous polynomial is a 
polynomial in which all terms have 
the same degree. 


Consider the function 




» (:) - o 
«(:)-(:)• 


where p is a homogeneous polynomial of degree 4. What condition must the co- 
efficients of / satisfy in order for the crossed partials D\(D 2 (f)) and D 2 (Di(f)) 
to be equal at the origin? 


Exercises for Section 3.4: 

Rules for Computing 
Taylor Polynomials 

Hint for Exercise 3.4.2, part 
(a): it is easier to substitute x 4- 
y in the Taylor polynomial for 
sinu than to compute the par- 
tial derivatives. Hint for part (b): 
Same as above, except that you 
should use the Taylor polynomial 
of 1/(1 + u). 


3.4.1 Prove the formulas of Proposition 3.4.2. 

3.4.2 (a) What is the Taylor polynomial of degree 3 of sin(x + y 2 ) at the 
origin? 

(b) What is the Taylor polynomial of degree 4 of 1/(1+ x 2 + y 2 ) at the origin? 

3.4.3 Write, to degree 2, the Taylor polynomial of 

/ ( y ) = \/l + sin(x + y) at the origin. 



Hint for Exercise 3.4.5, part 
(a): this is easier if you use sin(ct + 
3) — sin a cos 3 + cos a sin 3. 


Exercises for Section 3.5: 
Quadratic Forms 


Exercise 3.5.4: by “represents 
the quadratic form” we mean that 
Q can be written as x ■ Ax (see 
Proposition 3.7.11). 
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3 . 4.4 Write, to degree 3, the Taylor polynomial Pf Q of 

/ ^ ^ = cos(l + sin(x 2 4- 1/)) at the origin. 

3 . 4.5 (a) What is the Taylor polynomial of degree 2 of the function / ( * ) = 

sin(2x 4- y) at the point (^3)? 

(b) Show that. 

/ (y) + K 2x + y "?)“( I “5) 2 

has a critical point at (^3)- What kind of critical point is it? 

3 .5.1 Let V be a vector space. A symmetric bilinear function on V is a 
mapping B : V x V — > R such that 

(1) B(av j 4- 6y 2 , w) = a£(v,,w) 4- bB(v 2 , w) for all Yj , v 2 , w € V and 
a, 6 € R; 

(2) B(v, w) = B(w, v) for all v, w 6 V. 

(a) Show that if A is a symmetric n x w matrix, the mapping B A (y, w) = 
y T 4w is a symmetric bilinear function. 

(b) Show that every symmetric bilinear function on R n is of the form B A for 
a unique symmetric matrix A. 

(c) Let Pk be the space of polynomials of degree at most k. Show that the 
function B : Pk x P* ~ ^ > R given by B(p, q) = Jq p{t)q(t) dt is a symmetric 
bilinear function. 

(d) Denote by p\(t) = 1,^(0 = . . . ,pjt+i(0 = t k the usual basis of Pk, and 

by the corresponding “concrete to abstract” linear transformation. Show 
that £($ p (a,b) is a symmetric bilinear function on R n , and find its matrix. 

3 . 5.2 If B is a symmetric bilinear function, denote by Qb : V — ► U the 
function Q(v) = B(v,v). Show that every quadratic form on is of the form 
Qb for some bilinear function B. 


3.5.3 Show that 

Q(p)= f (: p(t )) 2 dt 

Jo 

(see Example 3.5.2) is a quadratic form if p is a cubic polynomial, i.e., if p(t) = 
do 4" a\t 4" a 2 t 2 4- a 3 t 3 . 


3.5.4 Confirm that the symmetric matrix A = 
sents the quadratic form Q = x 2 4- xz - yz - z 2 . 




1/2 
“ 1/2 
-1 J 


repre- 
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3.5.5 (a) Let P k be the space of polynomials of degree at most k. Show that 
the function 

Q(p ) = f (pit)) 2 - ( p'(t )) 2 is a quadratic form on P k . 

Jo 

(b) What is the signature of Q when k = 2? 

3.5.6 Let P k be the space of polynomials of degree at most k. 

(a) Show that the function S a : P k — * R given by S a (p) = p(a) is a linear 
function. 

(b) Show that <S 0 ,. . . >6 k are linearly independent. First say what it means, 
being careful with the quantifiers. It may help to think of the polynomial 

x(x — l)...(x — j — l)(x - j 4- 1) . . . (x — k), 

which vanishes at 0, 1, . . . , j - 1, j + 1, A: but not at j. 

(c) Show that the function 

Q(p) = (p(0)) 2 - (p(D) 2 + • • • + (-i)*(p(*)) 2 

is a quadratic form on P k . When k = 3, write it in terms of the coefficients of 
p(x) = ax 3 + bx 2 + cx + d. 

(d) What is the signature of Q when k ~ 3? There is the smart way, and 
then there is the plodding way . . . 

3.5.7 For the quadratic form of Example 3.5.6, 

Q(x) = x 2 + 2 xy - 4xz + 2 yz - 4z 2 , 

(a) What decomposition into a sum of squares do you find if you start by 
eliminating the z terms, then the y terms, and finally the x terms? 

(b) Complete the square starting with the x terms, then the y terms, and 
finally the z terms. 

3.5.8 Consider the quadratic form of Example 3.5.7: 

Q(X) = xy - xz + yz. 

(a) Verify that the decomposition 

(x/2 + y/2) 2 - (x/2 - y/2 + z) 2 + z 2 
is indeed composed of linearly independent functions. 

(b) Decompose Q(X) with a different choice of u, to support the statement 
that u — x - y was not a magical choice. 

3.5.9 Are the following quadratic forms degenerate or nondegenerate? 

(a) x 2 + 4 xy + 4 y 2 on R 2 . 

(b) x 2 + 2 xy + 2 y 2 + 2 yz + z 2 on R 3 . 
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Hint for Exorcise; 3.5.14: The 
main point is to prove that if the 
quadratic form Q lias signature 
(A:,0) with k < n, there is a vec- 
tor v ^ 6 such that Q(v) — 0. 
You can find such a vector using 
the transformation T of Equation 
3.5.26. 


Exercise 3.5.16: See margin 
note for Exercise 3.5.4. 


(c) 2x 2 + 2 y 2 + 2 2 + w 2 + 4 xy + 2 xz - 2 xw - 2 yw on P 4 . 

3.5.10 Decompose each of the following quadratic forms by completing 
sciuares. and determine its signature. 

(a) x 2 + xy - 3$/ 2 (b) x 2 + 2xy - y 2 (c) x 2 + xy + yz (d) xy + yz 


3.5.11 What is the signature of the following quadratic forms? 
(a) x 2 + xy on P. 2 (b) xy -I- yz on P 3 

a ^ on P 4 *(d) x\X 2 + X 2 X 3 H on P. M 


(c) det 


3.5.12 On I ?: 4 as described by M — 
Q(M) = det M. What is its signature? 


a c 
b d 


, consider the quadratic form 


3.5.13 Consider again Q(M) = tr(M 2 ), operating on the space of upper 
triangular matrices described by M = \ . 


(a) What kind of surface in K 3 do you get 


by setting Q{M 2 ) = 1? 


(b) What kind of surface in P 3 do you get by setting Q(MM l ) = 1? 


3.5.14 Show that a quadratic form on R n is positive definite if and only if its 
signature is (n,0). 

3.5.15 Here is an alternative proof of Proposition 3.5.14. Let Q : — ► P 

be a positive definite quadratic form. Show that there exists a constant C > 0 
such that 

Q(x) > C|x| | 2 3.5.30 

for all x € Pi n , as follows. 

(a) Let 5 n_1 = {x € P n ||x| = 1}. Show that 5”” 1 is compact, so there 
exists x 0 € 5 n_1 with Q(xo) < Q(x) for all x € S n ~ l . 

(b) Show that Q(xo) > 0. 

(c) Use the formula Q(x) = |x| 2 Q(x/|x|) to prove Proposition 3.5.14. 


3.5.16 Show that a 2 x 2 symmetric matrix G = 
definite quadratic form if and only if det G > 0, a + d > 6. 


a b 
b d 


represents a positive 


3.5.17 Consider the vector space of Hermitian 2x2 matrices: 
a b + ic ' 
b-ic d 


H= 


What is the signature of the quadratic form Q(H )- det //? 


3.5.18 Identify and sketch the conic sections and quadratic surfaces repre- 
sented by the quadratic forms defined by the following matrices: 
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’2 

1 

O' 


'2 

0 

3' 

(a) 

'2 f 
! 3 

(b) 

1 

0 

2 

1 

1 

2 

(c) 

0 

3 

0 

0 

0 

-1 

m 


(d) 

‘-I o' 

1 4 

(e) 

2 4 

4 1 

-3' 

3 

(0 

‘l 2' 
2 4 




-3 3 

-1 




3.5.19 Determine the signature of each of the following quadratic forms. 
Where possible, sketch the curve or surface represented by the equation. 

(a) x 2 + xy - y 2 = 1 (b) x 2 + 2 xy - y 2 = 1 

(c) x 2 + xy + yz = 1 (d) xy + yz = 1 


Exercises for Section 3.6: 
Classifying Critical Points 


x 


3.6.1 (a) Show that the function / [ y | = x 2 4- xy + z l — cos y has a critical 

2 
x 

point, at the origin. 

(b) What kind of critical point does it have? 


3.6.2 Find all the critical points of the following functions: 

(a) sin x cosy (b) 2x 3 - 24xy + 16y 3 

(c) xy + ^ i *(d) sin x + sin y + sin(x + y) 

For each function, find the second degree approximation at the critical points, 
and classify the critical point. 


3.6.3 Complete the proof of Theorem 3.6.8 (behavior of functions near saddle 
points), showing that if / has a saddle at a € U, then in every neighborhood of 
a there are points c with /(c) < /(a). 

3.6.4 (a) Find the critical points of the function /(f) = x* - 12 xy + 8y 3 . 
(b) Determine the nature of each of the critical points. 


3.6.5 Use Newton’s method (preferably by computer) to find the critical 
points of -i 3 + y 3 + xy + 4x - 5 y. Classify them, still using the computer. 


3.6.6 (a) Find the critical points of the function / I y ] = xy+yz-xz + xyz. 


(b) Determine the nature of each of the critical points. 

3.6.7 (a) Find the critical points of the function / ^ 

(b) What kind of critical points are these? 


= 3x 2 — 6xy + 2 y 3 . 
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Exercises for Section 3.7: 

Constrained Extrema 
and Lagrange Multipliers 


3.7.1 Show that the mapping 

( sin uv + u\ 

u +j I 

is a parametrization of a smooth surface. 

(a) Show that the image of g is contained in the locus S of equation 


z = (x - sin z)(sin z - x + y). 


(b) Show that S is a smooth surface. 

(c) Show that g maps IR 2 onto S. 


Hint for Exercise 3.7.2, part 
(b): The tangent plane to Y at 
any point is always parallel to the 
y-axis. 


(d) Show that g is one to one, and that [Dg(^)J is one to one for every 
(“)€R 2 . 

(x\ 

3.7.2 (a) Show that the function \ y \ = x + y + z constrained to the 

surface Y of equation x = sin z has no critical point. 

(b) Explain geometrically why this is so. 

3.7.3 (a) Show that the function xyz has four critical points on the plane of 
equation 





= ax + cy + dz - 1=0 


when a, b, c > 0. (Use the equation of the plane to write z in terms of x and 
y;i.e., parametrize the plane by x and y.) 

(b) Show that of these four critical points, three are saddles and one is a 
maximum. 


3.7.4 Let Q(x) be a quadratic form. Construct a symmetric matrix A as 
follows: each entry A ifi on the diagonal is the coefficient of xf, while each entry 
Aij is one-half the coefficient of the term x t Xj. 

a) Show that Q(x) = x • Ax. 

b) Show that A is the unique symmetric matrix with this property. Hint: 
consider Q(ei), and Q(aei + 6e ; ). 

3.7.5 Justify Equation 3.7.32, using the definition of the derivative and the 
fact that A is symmetric. 

3.7.6 Let A be any matrix (not necessarily square). 
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Part (c) of Exercise 3.7.6 uses 
the norm ||>1|| of a matrix A. The 
norm is defined (Definition 2.8.5) 
in an optional subsection of Sec- 
tion 2.8. 


(a) Show that AA T is symmetric. 

(b) Show that all eigenvalues A of AA r are non-negative, and that they are 
all positive if and only if the kernel of A is {0}. 

(c) Show that 

||>l|| = sup v/A. 

A eigenvalue of AA r 

3.7.7 Find the minimum of the function x 3 -I- y 3 + z 3 on the intersection of 
the planes of equation x + y + z = 2 and x 4- y - z = 3. 


3.7.8 Find all the critical points of the function 
x\ 

y I = 2 xy + 2 yz - 2x 2 - 2 y 2 - 2 z 2 on the unit sphere of IR 3 . 

z ) 

3.7.9 What is the volume of the largest rectangular parallelepiped contained 
in the ellipsoid 

x 2 + Ay 2 + 9z 2 < 9 ? 

3.7.10 Let A , B , C, D be a convex quadrilateral in the plane, with the vertices 
free to move but with a the length of AB> b the length of BC, c the length of 
CD and d the length of DA all assigned. Let <p be the angle at A and ip be the 
angle at C. 

(a) Show that the angles <p and ip satisfy the constraint 

a 2 + d 2 - 2d cos <^> = b 2 + c 2 - 2bc cos ip. 

(b) Find a formula for the area of the quadrilateral in terms of ip, ip and 
a, b , c, d. 

(c) Show that the area is maximum if the quadrilateral can be inscribed in 
a circle. You may use the fact that a quadrilateral can be inscribed in a circle 
if the opposite angles add to n. 



3.7.11 Find the minimum of the function x 3 + y 3 + z 3 on the intersection of 
the planes of equation 

x + y + z = 2 and x + y - z = 3. 

3.7.12 What is the maximum volume of a box of surface area 10, for which 
one side is exactly twice as long as another? 

3.7.13 What is the maximum of xyz , if x, y, z belong to the surface of equa- 
tion x + y + z 2 = 16? 

3.7.14 (a) If / ^ ^ = a + 6x 4- cy, what are 

ni f { X y)^ and jfjft'ffijW 




Figure 3.7.16. 
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2 

(b) Let / be as above. What is the minimum of / Q 2 (/ (y)) \dxdy\ 
among all functions / such that 

f 0 f 0 s { X y)^ = l - 

3.7.15 (a) Show that the set X C Mat (2, 2) of matrices with determinant 1 

is a smooth submanifold. What is its dimension? 

fO ll 

(b) Find a matrix in X which is closest to the matrix « . 


**3.7.16 Let D be the closed domain bounded by the line of equation x+y — 
0 and the circle of equation x 2 + y 2 = 1, whose points satisfy x > - y , as shaded 
in Figure 3.7.16. 

(a) Find the maximum and minimum of the function f 



(b) TVy it again with 


f(y)= X + 5x V- 


Exercises for Section 3.8: 

Geometry of Curves 
and Surfaces 

Useful fact for Exercise 3.8.1 

The arctic circle is those points 
that are 2607.5 kilometers south 
of the north pole. 


3.8.1 (a) How long is the arctic circle? How long would a circle of that radius 

be if the earth were flat? 

(b) How big a circle around the pole would you need to measure in order 
for the difference of its length and the corresponding length in a plane to be 
one kilometer? 


3.8.2 Suppose 7(f) 


7l(<)\ 


is twice continuously differentiable on a 


\7 n(t)/ 

neighborhood of [a, 6]. 

(a) Use Taylor’s theorem with remainder (or argue directly from the mean 
value theorem) to show that for any Si < S2 in [a, 6], we have 


M^) - 7(«i) - 7'(si)(s2 - «i)| < C\s 2 - si| 2 , where 


c = \/n sup sup 1€ ( a ,6]|7"(t)|. 

j— 1 ...n 

(b) Use this to show that 

*>m l7(t<+i ~7(<i)l = f |7 '{t)\dt, 

i=0 da 

where a — to < t\ • < t m = b> and we take the limit as the distances ti+ 1 - t % 
tend to 0. 


3.8.3 Check that if you consider the surface of equation z = f(x), y arbitrary, 
and the plane curve z = /(x), the mean curvature of the surface is half the 
curvature of the plane curve. 
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3.8.4 (a) Show that the equation y cos 2 = x sin 2 expresses 2 implicitly as a 
function * = g r (*) near the point (y®) = (o) when r * °- 

(b) Show that D x g r = D\g r = 0. (Hint: The x-axis is contained in the 
surface) 

3.8.5 Compute the curvature of the surface of equation 2 = y/x 2 + y 2 at 
a \ 

b I . Explain your result. 

y/a 2 + ) 

3.8.6 (a) Draw the cycloid, given parametrically by 

(x\ _ ( a(t-sint) \ 

\ y ) \a(l - cost ) ) ' 

(b) Can you relate the name “cycloid” to “bicycle”? 

(c) Find the length of one arc of the cycloid. 

3.8.7 Do the same for the hypocycloid 

(x\ ( a cos? t \ 

\y) v a sin 3 1 ) * 



Hint for Exercise *3.8.9: The 
curve 

[t(t) f n(£),b(f)] =T(t) 

is a mapping I *-* 50(3), so t *-* 
T~ l (to)T(t) is a curve in 50(3) 
passing through the identity at <o- 


3.8.8 (a) Let / : [a, b] — ► R be a smooth function satisfying f(x ) > 0, and 

consider the surface obtained by rotating its graph around the x-axis. Show 
that the Gaussian curvature K and the mean curvature H of this surface depend 
only on the x-coordinate. 

(b) Show that 

K{ , = zlM. 

u m(i+r(x ))*• 

(c) Find a formula for the mean curvature in terms of / and its derivatives. 

*3.8.9 Use Exercise *3.2.8 to explain why the FYenet formulas give an anti- 
symmetric matrix. 

*3.8.10 Using the notation and the computations in the proof of Proposition 
3.8.7, show that the mean curvature is given by the formula 

H = 2 (1 +^2)3/2 (°2.°( 1 + a 2) - 2a 1 a 2 a lll + 00,2(1+ a?))- 3.8.34 




4 

Integration 


When you can measure what you are speaking about, and express it in 
numbers , you know something about it; but when you cannot measure 
it, when you cannot express it in numbers, your knowledge is of a mea- 
ger and unsatisfactory kind: it may be the beginning of knowledge , but 
you have scarcely, in your thoughts, advanced to the stage of science . — 
William Thomson, Lord Kelvin 


4.0 Introduction 


An actuary deciding what pre- 
mium to charge for a life insurance 
policy needs integrals. So does 
a bank deciding what to charge 
for stock options. Black and Sc- 
holes received a Nobel prize for 
this work, which involves a very 
fancy stochastic integral. 


Chapters 1 and 2 began with algebra, then moved on to calculus. Here, as in 
Chapter 3, we dive right into calculus. We introduce the relevant linear algebra 
(determinants) later in the chapter, where we need it. 

When students first meet integrals, integrals come in two very different fla- 
vors: Riemann sums (the idea) and anti-derivatives (the recipe), rather as 
derivatives arise as limits, and as something to be computed using Leibnitz’s 
rule, the chain rule, etc. 

Since integrals can be systematically computed (by hand) only as anti- 
derivatives, students often take this to be the definition. This is misleading: 
the definition of an integral is given by a Riemann sum (or by “area under the 
graph”; Riemann sums are just a way of making the notion of “area” precise). 
Section 4.1 is devoted to generalizing Riemann sums to functions of several 
variables. Rather than slice up the domain of a function / : R — ► R into little 
intervals and computing the “area under the graph” corresponding to each in- 
terval, we will slice up the “n~dimensional domain” of a function in / : R n — + K 
into little n-dimensional cubes. 

Computing n-dimensional volume is an important application of multiple 
integrals. Another is probability theory; in fact probability has become such 
an important part of integration that integration has almost become a part of 
probability. Even such a mundane problem as quantifying how heavy a child 
is for his or her height requires multiple integrals. Fancier yet are the uses of 
probability that arise when physicists study turbulent flows, or engineers try 
to improve the internal combustion engine. They cannot hope to deal with one 
molecule at a time; any picture they get of reality at a macroscopic level is 
necessarily based on a probabilistic picture of what is going on at a microscopic 
level. We give a brief introduction to this important field in Section 4.2. 
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Section 4.3 discusses what functions are integrable; in the optional Section 
4.4, we use the notion of measure to give a sharper criterion for integrabilitv (a 
criterion that applies to more functions than the criteria of Section 4.3). 

In Section 4.5 we discuss Fubini's theorem, which reduces computing the 
integral of a fund ion of n variables to computing n ordinary integrals. This is an 
important theoretical tool. Moreover, whenever an integral can be computed in 
elementary terms, Fubini's theorem is the key tool. Unfortunately, it is usually 
impossible to compute anti-derivatives in elementary terms even for functions 
of one variable, and this tends to be truer yet of functions of several variables. 

In practice, multiple integrals are most often computed using numerical 
methods, which we discuss in Section 4.6. We will see that although the theory 
is much the same in P 2 or K 10 * 4 , the computational issues are quite different. 
We will encounter some entertaining uses of Newton’s method when looking for 
optimal points at which to evaluate a function, and some fairly deep probability 
in understanding why the Monte Carlo methods work in higher dimensions. 

Defining volume using dyadic pavings, as we do in Section 4.1, makes most 
theorems easiest to prove, but such pavings are rigid; often we will want to 
have more “paving stones” where the function varies rapidly, and bigger ones 
elsewhere. Having some flexibility in choosing pavings is also important for the 
proof of the change of variables formula . Section 4.7 discusses more general 
pavings. 

In Section 4.8 we return to linear algebra to discuss higher-dimensional de- 
terminants. In Section 4.9 we show that in all dimensions the determinant 
measures volumes; we use this fact in Section 4.10, where we discuss the change 
of variables formula. 

Many of the most interesting integrals, such as those in Laplace and Fourier 
transforms, are not integrals of bounded functions over bounded domains. We 
will discuss these improper integrals in Section 4.11. Such integrals cannot be 
defined as Riemann sums, and require understanding the behavior of integrals 
under limits. The dominated convergence theorem is the key tool for this. 

4.1 Defining the Integral 


The Greek letter /?, or “rho," is 
pronounced “row.” 


Integration is a summation procedure; it answers the question: how much is 
there in all.' In one dimension, p(x ) might be the density at point a: of a bar 
parametrized by [a, 6]; in that case 



4.1.1 


is the total mass of the bar. 

If instead we have a rectangular plate parametrized by a < x < 6, c < y < d, 
and with density P ( ^ , then the total mass will be given by the double integral 


If '(S)**’ 

[ft.fr] x(t\</{ 


4.1.2 
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We will see in Section 4.5 that 
the double integral of Equation 
4.1.2 can be written 

f G C p ( x y) dx ) du - 

We are not presupposing this 
equivalence in this section. One 
difference worth noting is that / fj 6 
specifies a direction: from a to 
b. (You will recall that direction 
makes a difference: f * = - f°.) 
Equation 4.1.2 specifies a domain, 
but says nothing about direction. 



Sea Britain 

'V/ 'V/ 


Figure 4.1.1. 

The function that is rainfall 
over Britain and 0 elsewhere is dis- 
continuous at the coast. 


where [a. 6] x [c, d], i.e.. the plate, is the domain of the entire double integral 

II 

We will define such multiple integrals in this chapter. But you should always 
remember that the example above is too simple. One might want to understand 
the total rainfall in Britain, whose coastline is a very complicated boundary. (A 
celebrated article analyzes that coastline as a fractal, with infinite length.) Or 
one might want to understand the total potential energy stored in the surface 
tension of a foam; physics tells us that a foam assumes the shape that minimizes 
this energy'. 

Thus we want to define integration for rather bizarre domains and functions. 
Our approach will not work for truly bizarre functions, such as the function 
that equals 1 at all rational numbers and 0 at all irrational numbers; for that 
one needs Lebesgue integration, not treated in this book. But we still have to 
specify carefully what domains and functions we want to allow. 

Our task will be somewhat easier if we keep the domain of integration simple, 
putting all the complication into the function to be integrated. If we wanted 
to sum rainfall over Britain, we would use K 2 , not Britain (with its fractal 
coastline!) as the domain of integration; we would then define our function to 
be rainfall over Britain, and 0 elsewhere. 

Thus, for a function / : IR n —► M, we will define the multiple integral 

f /(x)|(Tx|, 4.1.3 

with HI” the domain of integration. 

We emphatically do not want to assume that / is continuous, because most 
often it is not: if for example / is defined to be total rainfall for October over 
Britain, and 0 elsewhere, it will be discontinuous over most of the border of 
Britain, as shown in Figure 4.1.1. What we actually have is a function g (e.g., 
rainfall) defined on some subset of !R n larger than Britain. We then consider 
that function only over Britain, by setting 



if x e Britain 
otherwise. 


4.1.4 


We can express this another way, using the characteristic function X. 


The characteristic function Xa 
is pronounced “kye sub A,” the 
symbol X being the Greek letter 
chi. 


Definition 4.1.1 (Characteristic function). For any bounded subset 
A c 3R n , the characteristic function Xa is: 



if x € A 
if x £ A. 


4.1.5 


Equation 4.1.4 can then be rewritten 

/(x) = ^(x)YBritain(x). 


4.1.6 
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We tried several notations be- 
fore choosing |d n x|. First we used 
dx i . . . dx n • That seemed clumsy, 
so we switched to dV . But it failed 
to distinguish between |d 2 x| and 
|d 3 x|, and when changing vari- 
ables we had to tack on subscripts 
to keep the variables straight. 

But dV had the advantage of 
suggesting, correctly, that we are 
not concerned with direction (un- 
like integration in first year calcu- 
lus, where / a 6 dx £ f* dx). We 
hesitated at first to convey the 
same message with absolute value 
signs, for fear the notation would 
seem forbidding, but decided that 
the distinction between oriented 
and unoriented domains is so im- 
portant (it is a central theme of 
Chapter 6) that our notation 
should reflect that distinction. 


The notation Supp (support) 
should not be confused with sup 
(least upper bound). 

Recall that “least upper bound” 
and “supremum” are synonymous, 
as are “greatest lower bound” and 
“infimum” (Definitions 1.6.4 and 
1.6.6). 


This doesn’t get rid of difficulties like the coastline of Britain — indeed, such 
a function / will usually have discontinuities on the coastline -but putting all 
the difficulties oil the side of the function will make our definitions easier (or at 
least shorter). 

So while we really want to integrate g (i.e., rainfall) over Britain, we define 
that integral in terms of the integral of / over M n , setting 

[ g\d n x\ = f /|cTx|. 4.1.7 

J Britain J 

More generally, when integrating over a subset AcK”, 


f g(x)\d"x\^ f g(x)X A (.x)\d n 
J A J R" 


4.1.8 


Some preliminary definitions and notation 

Before defining the Riemann integral, we need a few definitions. 

Definition 4.1.2 (Support of a function: Supp(/)). The support of a 
function / : R n — » R is 

Supp(/) = {x € R n | /(x) ^ 0} . 4.1.9 


Definition 4.1.3 ( M A ( f) and m A (f)). If A C R n is an arbitrary subset, 
we will denote by 

Ma(/) = sup x€>t /(x), the supremum of /(x) for x e A 

™>aU) - inf x€/ i /(x), the infimum of /(x) for x £ A. 4.1.10 

Definition 4.1.4 (Oscillation). The oscillation of / over 4, denoted 

oscAif), is the difference between its least upper bound and greatest lower 
bound: 

osc,4(/) = M A (f) - m A (f). 4.1.11 


Definition of the Riemann integral: dyadic pavings 

In Sections 4. 1-4.9 we will discuss only integrals of functions / satisfying 

(1) |/1 is bounded, and 

(2) / has bounded support, i.e., there exits R such that f(x) = 0 when 
JxJ > R. 

With these restrictions on /, and for any subset A C R n , each quantity 
a m(/)> ^aU), and osc*(/), is a well-defined finite number. This is not true 
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In Serf ion 4.7 we will see (hat 
much more general pavings ran be 
used. 

We rail our pavings dyadic be- 
cause each time we divide by a 
factor of 2; “dyadic" comes from 
the Clreok dyus s meaning two. We 
could use decimal pavings instead, 
rutting earh side into ten parts 
each time, but dyadic pavings are 
easier to draw. 


l r — 



"MM"' 


Figure 4.1.3. 

A dyadic decomposition in R 2 . 
The entire figure is a “cube" in R 2 
at level N = 0, with side length 
1/2° =1. At level 1 (upper left, 
quadrant), rubes have side length 
1/2 1 — 1/2; at level 2 (upper right 
quadrant), they have side length 
1/2 2 = 1/4; and so on. 


for a function like f{x) = 1/r. defined on the open interval (0. 1). In that case 
|/| is not bounded, and sup f(x) = x. 

There is quite a bit of choice as to how to define the integral; we will first 
use the most restrictive definition: dyadic pavings of R". 

To compute an integral in one dimension, we decompose the domain into 
little intervals, and construct on each the tallest reet angle which fits under the 
graph and the shortest rectangle which contains it. as shown in Figure 4.1.2. 




Figure 4.1.2. Left: Lower Riemann sum for J* f(x) dr. Right: Upper Riemann 
sum. If the two sums converge to a common limit, that limit is the integral of the 
function. 


The dyadic upper and lower sums correspond to decomposing the domain 
first at the integers, then at the half-integers, then at the quarter-integers, etc. 

If, as we make the rectangles skinnier and skinnier, the sum of the area of 
the upper rectangles approaches that of the lower rectangles, the function is 
integrable. We can then compute the integral by adding areas of rectangles — 
either the lower rectangles, the upper rectangles, or rectangles constructed some 
other way, for example by using the value of the function at the middle of each 
column as the height of the rectangle. The choice of the point at which to 
measure the height doesn’t matter since the areas of the lower rectangles and 
the upper rectangles can be made arbitrarily close. 

To use dyadic pavings in R" we do essentially the same thing. We cut up 
IR n into cubes with sides 1 long, like the big square of Figure 4.1.3. (By “cube" 
we mean an interval in R, a square in R 2 . a cube in R 3 , and analogs of cubes in 
higher dimensions.) Next we cut each side of a cube in half, cutting an interval 
in half, a square into four equal squares, a cube into eight equal cubes .... At 
the next level we cut each side of those in half, and so on. 

To define dyadic pavings in R" precisely, we must first say what we mean by 
an n-dimensioual “cube." For every 



k = 


€Z". 



where 7, represents the integers, 


4.1.12 
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we define the cube 


C k . N = jx 6 ffi" < Xi < for 1 < f < n| . 


4.1.13 



Figure 4.1.3. 


Each cube C has two indices. The first index, k, locates each cube: it gives 
the numerators of the coordinates of the cube’s lower left-hand corner, when the 
denominator is 2 V . The second index, TV , tells which “level” we are considering, 
starting with 0; you may think of TV as the “fineness” of the cube. The length 
of a side of a cube is l/2 N , so when TV = 0, each side of a cube is length 1; 
when N = 1, each side is length 1/2; when TV = 2, each side is length 1/4. The 
bigger TV is, the finer the decomposition and the smaller the cubes. 


Example 4.1.5 (Dyadic cubes). The small shaded cube in the lower right- 
hand quadrant of Figure 4.1.3 (repeated at left) is 



x € III 2 

9 10 6 „ 7 

^16 - 16/16 “ y 16 [ 

l 

Sr- V - | 

width of cube height of cube ) 


4.1.14 


In Equation 4.1.13, we chose 
the inequalities < to the left of x, 
and < to the right so that at ev- 
ery level, every point of IPi n is in 
exactly one cube. We could just 
as easily put them in the opposite 
order; allowing the edges to over- 
lap wouldn’t be a problem either. 


For a three-dimensional cube, k has three entries, and each cube C^.n con- 
x 

sists of the x = [ y I € K 3 such that 


*i- < x < h±± . h- <v< * 2 + 1 - *1 *3 + 1 

~ Kt — X ^ ’ OJV — V < n M > o ST — Z < 


2 n ~ 


2^ 1 2 N 


■V" 


✓ >— 


2 n ’ 2 N 
s V- 


2 n 


4.1.15 


width of cube 


length of cube 


height of cube 


The collection of all these cubes paves R n : 


Definition 4.1.6 (Dyadic pavings). The collection of cubes C k) jv at a 
single level TV, denoted 2V(R n ), is the TVth dyadic paving of JR n . 


We use vol n to denote n-dimen- 
sional volume. 


The n-dimensional volume of a cube C is the product of the lengths of its 
sides. Since the length of one side is 1/2^, the n-dimensional volume is 


You are asked to prove Equa- 
tion 4.1.17 in Exercise 4.1.5. 


vol n C=(^) ; i.e., vol„C=^. 4.1.16 

Note that all C € T>n (all cubes at a given resolution) have the same n- 
dimensional volume. 

The distance between two points x, y in a cube C € T>n is 
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As in Definition 4.1.3, Mc(f) 
denotes the least upper bound, 
and mc{f) denotes the greatest 
lower bound. 

Since we are assuming that / 
has bounded support, these sums 
have only finitely many terms. 
Each term is finite, since / itself 
is bounded. 


We invite you to turn this ar- 
gument into a formal proof. 


Thus two points in the same cube C are close if N is large. 

Upper and lower sums using dyadic pavings 

With a Riemann sum in one dimension we sum the areas of the upper rectangles 
and the areas of the lower rectangles, and say that a function is integrable if the 
upper and lower sums approach a common limit as the decomposition becomes 
finer and finer. The common limit is the integral. 

We will do the same thing here. We define the Nth upper and lower sums 

U N (f) = Y Mc(f)vol n C, L n U) = Y m c(f) v°ln C. 4.1.18 

upper sum CeVs ,ower 8um C€Z>;v 

For the Nth upper sum we compute, for each cube C at level N, the product 
of the least upper bound of the function over the cube and the volume of the 
cube, and we add the products together. For the lower sum we do the same 
thing, using the greatest lower bound. Since for these pavings all the cubes 
have the same volume, it can be factored out: 

VnU) =2 S E M c(/), LnU) = 2^f £ ™c(/). 4119 

cev N cev N 

vol. of cube vol. of cube 


Proposition 4.1.7, As N increases, the sequence UnU) decreases, and the 
sequence L^-(f) increases. 

Think of a two-dimensional function, whose graph is a surface with moun- 
tains and valleys. At a coarse level, where each cube (i.e., square) covers a lot 
of area, a square containing both a mountain peak and a valley will contribute 
a lot to the upper sum; the mountain peak will be the least upper bound for 
the entire large square. As N increases, the peak is the least upper bound for 
a much smaller square; other small squares that were part of the original big 
square will have a much smaller least upper bound. 

The same argument holds, in reverse, for the lower sum; if a large square 
contains a deep valley, the entire square will have a low greatest lower bound, 
contributing to a small lower sum. As N increases and the squares get smaller, 
the valley will have less of an impact, and the lower sum will increase. 

We are now* ready to define the multiple integral. First we will define upper 
and lower integrals. 

Definition 4.1.8 (Upper and lower integrals). We call 
{/(/) = lim U N (f) and L({) = Jim L N (f) 
the upper and lower integrals of /. 


4.1.20 
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Of course, we are simply com- 
puting 

The point of this example is to 
show that this integral, almost the 
easiest that calculus provides, can 
be evaluated by dyadic sums. 

Since we are in dimension 1, 
our cubes are intervals: 

Ck,N — [k/2 N ,k/2 NJrX ). 


Definition 4.1.9 (Integrable function). A function / : R n -> R is inte- 
grate if its upper and lower integrals are equal; its multiple integral is then 
denoted 


/ /|<fx| = £/(/) = Uf). 

J E n 


4.1.21 


It is rather hard to find integrals that can be computed directly from the 
definition; here is one. 


Example 4.1.10 (Computing an integral). Let 



if 0 < x < 1 
otherwise, 


4.1.22 


which we could express (using the characteristic function) as the product 


/(x) =xX[ o,i] (x). 


4.1.23 


First, note that / is bounded with bounded support. Unless 0 < k/2 N < 1, 
we have 




= 0 . 


4.1.24 


If 0 < k/2 N < 1, then 
m Ck ,s(f) = 


and 


«&.»(/) 


4.1.25 


greatest lower bound of / over C k .N 
is the beginning of the interval 


lowest upper bound of / over C k ,N 
is the beginning of the next interval 


Thus 


L »(f) = fEw and wEw- 

k=0 k=l 

In particular, V N {f) - L N (f) = 2 N /2 2N = 1/2*, which tends to 
tends to oo, so / is integrable. Evaluating the integral requires the 
1 + 2 + • • • -f m — m(m 4- 1)/2. Using this formula, we find 

1 (2* — \)2 n 1 1 

N\f) 2 N 2 • 2* ° ^ nw) 


4.1.26 

0 as N 
formula 


2 N 


and U N (f) = 1} = 1(1 + £r). 

Clearly both sums converge to 1/2 as TV tends to oo. A 


4.1.27 
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Warning! Before doing this 
you must know that your function 
is integrable: that the upper and 
lower sums converge to a common 
limit It is perfectly possible for 
a Riemann sum to converge with- 
out the function being integrable 
(see Exercise 4.1.6). In that case, 
the limit doesn’t mean much, and 
should be viewed with distrust. 

In computing a Riemann sum, 
any point will do, but some are 
better than others. The sum will 
converge faster if you use the cen- 
ter point rather than a corner. 

When the dimension gets re- 
ally large, like 10 24 , as happens in 
quantum field theory and statisti- 
cal mechanics, even in straightfor- 
ward cases no one knows how to 
evaluate such integrals, and their 
behavior is a central problem in 
the mathematics of the field. We 
give an introduction to Riemann 
sums as they are used in practice 
in Section 4.6. 


Riemann sums 

Computing the upper integral U{f) and the lower integral L(f) may be difficult. 
Suppose we know that f is integrable. Then, just as for Riemann sums in one 
dimension, we can choose any point Xk,N € Ck,w we like, such as the center of 
each cube, or the lower left-hand corner, and consider the Riemann sum 

“width” “height” 

R (/, ff) = £ 4.1.28 

k€* n 

Then since the value of the function at some arbitrary point Xk.jv is bounded 
above by the least upper bound, and below by the greatest lower bound, 

™c k , N f < / (Xk.Af) < Mc kiN /, 4.1.29 

the Riemann sums R(f, N) will converge to the integral. 

Computing multiple integrals by Riemann sums is conceptually no harder 
than computing one-dimensional integrals; it simply takes longer. Even when 
the dimension is only moderately large (for instance 3 or 4) this is a serious 
problem. It becomes much more serious when the dimension is 9 or 10; even in 
those dimensions, getting a numerical integral correct to six significant digits 
may be unrealistic. 

Some rules for computing multiple integrals 

A certain number of results are more or less obvious: 

Proposition 4.1.11 (Rules for computing multiple integrals). 

(&) If two functions /, g : IR" — ► R are both integrable, then f + g is also 
integrable, and the integral of f + g equals the sum of the integral of f and 
the integral of g: 

f (/ + <7)|(Tx|= / /|«Tx|+/ g\<Tx\. 4.1.30 

J * n J* n Ja n 

(b) If f is an integrable function , and a € R, then the integral of q f equals 
a times the integral of f : 

f (a/)|d"x| = a/ /|<Tx|. 4.1.31 

(c) If f,g are integrable functions with f <g (i.e., /(x) < g(x) for all x), 
then the integral of f is less than or equal to the integral of g: 

( f\<Tx\<( g \d*x\. 


4.1.32 
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An example of Equation 4.1.33: 
if / and g are functions of cen- 
sus tracts, / assigning to each per 
capita income for April through 
September, and g per capita in- 
come for October through March, 
then the sum of the maximum 
value for / and the maximum val- 
ue for g must be at least the maxi- 
mum value of / + <?, and very likely 
more: a community dependent on 
the construction industry might 
have the highest per capita income 
in the summer months, while a ski 
resort might have the highest per 
capita income in the winter. 


You can read Equation 4.1.37 
to mean “the integral of the prod- 
uct /i(x)/ 2 (y) equals the product 
of the integrals,” but please note 
that we are not saying, and it is 
not true, that for two functions 
with the same variable, the inte- 
gral of the product is the product 
of the integrals. There is no for- 
mula for //i(x)/ 2 (x). The two 
functions of Proposition 4.1.37 
have different variables. 


Proof, (a) For any subset A C If:”, we have 

M A (f) -f- A/,i (g) > Ma{/ + g ) and m A (f) + m A (g) < m A (f + g). 4.1.33 
Applying this to each cube C 6 we get 

U N (f) + U.v(g) > U N (f + g) > L N (f + g)>L N V) + L N (g). 4.1.34 

Since the outer terms have a common limit as N — * oo, the inner ones have the 
same limit, giving 

UnU) + Un(9) = UnU + 9) = ^/v(/ + p) = L N {f) 4- L N (g). 

fun (/) M n xj+/ #n (9)|<f"x| f m n (f+g) |d"X| 

4.1.35 

(b) If a > 0, then U^(af) = aU^(f) and L^(af) — aL^(f) for any N , so 
the integral of af is a times the integral of /. 

Ifa < 0, then //^(a/) = aL N (f) and L^(af) = aU^{f), so the result is also 
true: multiplying by a negative number turns the upper limit into a lower limit, 
and vice versa. 

(c) This is clear: U^if) < Un(9) for every N. □ 

The following statement follows immediately from Fubini's theorem, which 
is discussed in Section 4.5. but it fits in nicely here. 

Proposition 4.1.12. If /i(x) is integr&ble on M n and / 2 (y) is integr&ble on 
M m , then the function 

s(x.y) = /i(x)/ 2 (y) 4.1.36 

on lR’ 1+ ' ,, is integr&ble, and 

9 |<f,x||<ry| = (L h l<rx| ) (L h ld ” ,yl ) ■ 4u7 

Proof. For any A x C R n , and A 2 C we have 

MaxxaM = M Al (f\)M Ai (f 2 ) and m Al xA2 {g) = m Al (h)m M (f 2 ). 

4.1.38 

Since any C € £>.v(R n+m ) is of the form C, x C 2 with C x € Z> N (R B ) and 
C 2 € T>n (M m ), applying Equation 4.1.38 to each cube separately gives 

U N (g) = UN(fi)U N (f 2 ) and L N (g) = L N (f x )L N {f 2 ). 4.1.39 

The result follows immediately. □ 
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Volume defined more generally 

The computation of volumes, historically the main motivation for integrals, 
remains an important application. We used the volume of cubes to define the 
integral; we now use integrals to define volume more generally. 

Definition 4.1.13 (n-dimensional volume). When Xa is integrable, the 
n-dimensional volume of A is 

vol n A = / X*|<f*x|. 4.1.40 

Jjk n 


Some texts refer to pavable sets 
as “contented” sets: sets with con- 
tent. 


Thus vol i is length of subsets of R, vol 2 is area of subsets of 1R 2 , and so on. 
We already defined the volume of dyadic cubes in Equation 4.1.16. In Propo- 
sition 4.1.16 we will see that these definitions are consistent. 

Definition 4.1.14 (Pavable set: a set with well-defined volume). A 
set is pavable if it has a well-defined volume, i.e., if its characteristic function 
is integrable. 


Lemma 4.1.15 (Length of interval). An interval I = [a, 6] has volume (i.e., 
length) |6 - a|. 


Proof. Of the cubes (i.e., intervals) C 6 at most two contain one of 

the endpoints a or b. All the others are either entirely in I or entirely outside, 
so on those 


The volume of a cube is 5 ^ 77 , 
but here n = 1. 


Recall from Section 0.3 that 
P - /] x x I n C R". 
means 

P = (x € M" | x, € /, } ; 

thus P is a rectangle if n = 2, a 
box if n = 3, and an interval if 
n =s 1 . 


f 1 if C C l 

M c {Xi) = m c {Xi) = < .. . 4.1.41 

l 0 if CHI =0, 

where denotes the empty set. Therefore the difference between upper and 
lower sums is at most two times the volume of a single cube: 

Un(Xi) - Ln(Xj) < 2^-, 4.1.42 

which tends to 0 as N — > 00 , so the upper and lower sums converge to the same 
limit: Xi is integrable, and I has volume. We leave its computation as Exercise 
4.1.13. □ 

Similarly, parallelepipeds with sides parallel to the axes have the volume one 
expects, namely, the product of the lengths of the sides. Consider 


P = I\ x • • • x I n c R n . 


4.1.43 
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Proposition 4.1.16 (Volume of parallelepiped). The parallelepiped 

p = /, x ■ • • x /„ C R n 4144 

formed by the product of intervals I, = [o»5 bi] has volume 

vol n (P) = |^1 “ Gil 1^2 ~ &a! • • • l^n *” a n|* 4.1.45 

In particular, the n-dimensional volume of a cube C € T>s{^ n ) is 


vol„C - 2nN 


4.1.46 


Proof. This follows immediately from Proposition 4.1.12, applied to 

Xp(x) = X/ 1 (xi)X/ 2 (x 2 ) • • -Xr n {x n )- □ 4.1.47 

The following elementary result has powerful consequences (though these 
will only become clear later). 

Disjoint means having no Theorem 4.1.17 (Sum of volumes). If two disjoint sets A,B in are 
points in common. pavable, then so is their union, and the volume of the union is the sum of 

the volumes: 

vol n (A UB) = vol n A + vo 1 n B. 4. 1.48 

Proof. Since Xavb = Xa + Xb, this follows from Proposition 4.1.11, (a). □ 

Proposition 4.1.18 (Set with volume 0). A set X C M n has volume 0 
if and only if for every e > 0 there exists N such that 

C € 2MR n ) 

cnx^i i> 

Unfortunately, at the moment there are very few functions we can integrate; 
we will have to wait until Section 4.5 before we can compute any really inter- 
esting examples. 

4.2 Probability and Integrals 

Computing areas and volumes is one important application of multiple in- 
tegrals. There are many others, coming from a wide range of different fields: 
geometry, mechanics, probability, .... Here we touch on a couple: computing 
centers of gravity and computing probabilities. They sound quite different, but 
the formulas are so similar that we think each helps in understanding the other. 


voln(C) < c. 


4.1.49 


You are asked to prove Propo- 
sition 4.1.18 in Exercise 4.1.4. 
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Integrating density gives mass. 


Here /x (mu) is a function 
from A to IR; to a point of A it as- 
sociates a number giving the den- 
sity of A at that point. 

In physical situations jx will be 
non-negative. 


Definition 4.2.1 (Center of gravity of a body), (a) If a body j4CiR n 
(i.e., a pavable set) is made of some homogeneous material, then the center 
of gravity of A is the point x whose ith coordinate is 


Xi 



4.2.1 


(b) More generally, if a body A (not necessarily made of a homogeneous 
material) has density then the mass M of such a body is 


M 


= |„(x) M-x|, 


4.2.2 


and the center of gravity x is the point whose ith coordinate is 

M ■ 


4.2.3 


We will see that in many problems in probability there is a similar function 
/*, giving the “density of probability.” 

A brief introduction to probability theory 

In probability there is at the outset an experiment, which has a sample space 
S and a probability measure Prob. The sample space consists of all possible 
outcomes of the experiment. For example, if the experiment consists of throwing 
a six-sided die, then S — {1, 2, 3, 4, 5, 6}. The probability measure Prob takes 
a subset A c .9, called an event, and returns a number Prob(.4) € [0, 1], which 
corresponds to the probability of an outcome of the experiment being in A. 
Thus the probability can range from 0 (it is certain that the outcome will not 
be in A) to 1 (it is certain that it will be in A). We could restate the latter 
statement as Prob(S') = 1. 

When the probability space S consists of a finite number of outcomes, then 
Prob is completely determined by knowing the probabilities of the individual 
outcomes. When the outcomes are all equally likely, the probability assigned 
any one outcome is 1 divided by the number of outcomes; 1/6 in the case of the 
die. But often the outcomes are not equally likely. If the die is loaded so that 
it lands on 4 half the time, while the other outcomes are equally likely, then 
the Prob{4} = 1/2, while the probability of each of the other five outcomes is 
1 / 10 . 

When an event A consists of several outcomes, Prob(j4) is computed by 
adding together the weights corresponding to the elements of A. If the experi- 
ment consists of throwing the loaded die described above, and A = {3,4}, then 
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If you have a ten-sided die, with 
the sides marked 0 ... 9 you could 
write your number in base 10 in- 
stead. 


Prob(A) = 1/10+1/2 = 3/5. Since Prob(S) = 1, the sum of all the weights for 
a given experiment always equals 1. 

Integrals come into play when a probability space is infinite. We might 
consider the experiment of measuring how late (or early) a train is; the sample 
space is then some interval of time. Or we might play “spin the wheel,” in 
which case the sample space is the circle, and if the game is fair, the wheel has 
an equal probability of pointing in any direction. 

A third example, of enormous theoretical interest, consists of choosing a 
number x € [0. lj by choosing its successive digits at random. For instance, you 
might write x in base 2, and choose the successive digits by tossing a fair coin, 
writing 1 if the toss comes up heads, and 0 if it comes up tails. 

In these cases, the probability measure cannot be understood in terms of 
the probabilities of the individual outcomes, because each individual outcome 
has probability 0. Any particular infinite sequence of coin tosses is infinitely 
unlikely. Some other scheme is needed. Let us see how to understand probabil- 
ities in the last example above. It is true that the probability of any particular 
number, like {1/3} or {v/2/2}, is 0. But there are some subsets whose proba- 
bilities are easy to compute. For instance Prob([0, 1/2)) = 1/2. Why? Because 
x € [0, 1/2), which in base 2 is written x € [0, .1), means exactly that the first 
digit of x is 0. More generally, any dyadic interval / € ZV(K) has probability 
1/2 N , since it corresponds to x starting with a particular sequence of N digits, 
and then makes no further requirement about the others. (Again, remember 
that our numbers are in base 2.) 

So for every dyadic interval, its probability is exactly its length. In fact, 
since length (i.e., volj) is defined in terms of dyadic intervals, we see that the 
probability of any pavable subset of A C [0, 1] is precisely 


Prob(A) = L Xa\<&\- 4.2.4 

A similar description is probably possible in the case of late trains: there 
is likely a function g(t) such that the probability of a train arriving in some 
time interval (a, 6) is given by Jj a 6 ] g(t) dt. One might imagine that the function 
looks like a bell curve, perhaps centered at the scheduled time to, but perhaps 
several minutes later if the train is systematically late. It might also happen 
that the curve is not bell-shaped, but camel-backed, reflecting the fact that if 
the train misses a certain light then it will be set back by some definite amount 
of time. 

In many cases where the sample space is something of the same sort is 
true: there is a function /x(x) such that 


ProbM) = / ,t(x)|d k x|. 


4.2.5 
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Of course some heights are im- 
possible. Clearly, the height of 
such a girl will not fall in the range 
10-12 inches, or 15-20 feet. In- 
cluding such impossible outcomes 
in a sample space is standard prac- 
tice. As William Feller points out 
in An Introduction to Probability 
Theory and Its Applications, vol. 
1 (pp. 7-8), “According to for- 
mulas on which modern mortal- 
ity tables are based, the propor- 
tion of men surviving 1 000 years 
is of the order of magnitude of 
one in 10 10 .... This statement 

does not make sense from a biolog- 
ical or sociological point of view, 
but considered exclusively from a 
statistical standpoint it certainly 
does not contradict any experience 
.... Moreover, if we were seri- 
ously to discard the possibility of 
living 1 000 years, we should have 
to accept the existence of a maxi- 
mum age, and the assumption that 
it should be possible to live x years 
and impossible to live x years and 
two seconds is as unappealing as 
the idea of unlimited life.” 


In this case, p is called a probability density ; to be a probability density the 
function p must satisfy 

p(x)>0 and / /i(x) |d*xf = 1. 4.2.6 

JR* 

We will first look at an example in one variable; later we will build on this 
example to explore a use of multiple integrals (which are, after all, the reason 
we have written this section). 

Example 4.2.2 (Height of 10-year-old girls). Consider the experiment 
consisting of choosing a 10-year-old girl at random in the U.S., and measuring 
her height. Our sample space is R. As in the case of choosing a real number 
from 0 to 10, it makes no sense to talk about the probability of landing on any 
one particular point in R. (No theoretical sense, at least; in practice, we are 
limited in our measurements, so this could be treated as a finite probability 
space.) What we can do is determine a “density of probability” function that 
will enable us to compute the probability of landing in some region of R, for 
example, height between 54 and 55 inches. 

Every pediatrician has growth charts furnished by the Department of Health, 
Education, and Welfare, which graph height and weight as a function of age, for 
girls from 2 to 18 years old; each consists of seven curves, representing the 5th, 
10th, 25th, 50th, 75th, 90th, and 95th percentiles, as shown in Figure 4.2.1. 



Age (years) Age (years) 


FIGURE 4.2.1. Charts graphing height and weight as a function of age, for girls from 
2 to 18 years old. 


Looking at the height chart and extrapolating (connecting the dots) we can 
construct the bell-shaped curve shown in Figure 4.2.2, with a maximum at x = 
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54.5, since a 10-year-old girl who is 54.5 inches tall falls in the 50th percentile 
for height. This curve is the graph of a function that we will call it gives 
_V T I — the “density of probability” for height for 10-year-old girls. For each particular 

48 50 52 54 56 58 60 range of heights, it gives the probability that a 10-year-old girl chosen at random 

will be that tall: 


Prob{/ii < h < / 12 } = f Ph(h)\dh\. 4.2.7 

Jh x 

50 60 70 80 90 100 1 10 Similarly, we can construct a “density of probability” function p w for weight, 

Figure 4.2.2. such that the probability of a child having weight w satisfying w\ < w < W2 is 

Top: graph of pn, giving the ru» 2 

“density of probability” for height Probfuq < w < W 2 } = / Pw( w ) \dw\. 4.2.8 

for 10-year-old girls. dwi 

Bottom: graph of p w < giving The integrals Hh(h)\dh\ and p w (w)\dw\ must of course equal 1. 
the “density of probability” for 



weight. 

It is not always possible to find 
a function p that fits the available 
data; in that case there is still 
a probability measure, but it is 
not given by a density probability 
function. 


Remark. Sometimes, as in the case of unloaded dice, we can figure out the 
appropriate probability measure on the basis of pure thought. More often, as in 
the “height experiment,” it is constructed from real data. A major part of the 
work of statisticians is finding probability density functions that fit available 
data. A 

Once we know an experiment’s probability measure, we can compute the 
expectation of a random variable associated with the experiment. 


The name random function is 
more accurate, but nonstandard. 

The same experiment (i.e., 
same sample space and probabil- 
ity measure) can have more than 
one random variable. 

The words expectation , expected 
value, mean, and average are all 
synonymous. 

Since an expectation E corre- 
sponds not just to a random vari- 
able / but also to an experiment 
with density of probability p, it 
would be more precise to denote 
it by something like E^(f). 


Definition 4.2.3 (Random variable). Let S be the sample space of 
outcomes of an experiment. A random variable is a function / : 5 — ► R. 

If the experiment consists of throwing two dice, we might choose as our 
random variable the function that gives the total obtained. For the height 
experiment, we might choose the function fn that gives the height; in that 
case, ///(x) = x. 

For each random variable, we can compute its expectation. 

Definition 4.2.4 (Expectation). The expectation E(f) of a random vari- 
able / is the value one would expect to get if one did the experiment a great 
many times and took the average of the results. If the sample space S is 
finite, E(f) is computed by adding up all the outcomes s (elements of S), 
each weighted by its probability of occurrence. If S is continuous, and p is 
the density of probability function, then 

E(f) = J s f(s)n(s)\<U\. 


Example 4.2.5 (Expectation). The experiment consisting of throwing two 
unloaded dice has 36 outcomes, each equally likely: for any s € S, Prob(s) = T.. 
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In the dice example, the weight 
for the total 3 is 2/36, since there 
are two ways of achieving the total 
3: (2,1) and (1,2). 

If you throw two dice 500 times 
and figure the average total, it 
should be close to 7; if not, you 
would be justified in suspecting 
that the dice are loaded. 

Equation 4.2.9: As in the finite 
case, we compute the expectation 
by “adding up” the various possi- 
ble outcomes, each weighted by its 
probability. 


Since Var (/) is defined in terms 
of the expectation of /, and com- 
puting the expectation requires 
knowing a probability measure, 
Var (/) is associated to a particu- 
lar probability measure. The same 
is true of the definitions of stan- 
dard deviation, covariance, and 
correlation coefficient. 


Let / be the random variable that gives the total obtained (i.e., the integers 2 
through 12). To determine the expectation, we add up the possible totals, each 
weighted by its probability: 


2 5 + *S + 4 S + 5 S +6 5 + 7 S + 8 S +# 5 + 1# 5 + ll 5 + 12 S- 7 - 


For the “height experiment” and “weight experiment” the expectations are 
E(f H )= f hnh(h)\dh\ and E(f w ) = t wp w {w)\dw\. A 4.2.9 


Note that an expectation does not need to be a realizable number; if our 
experiment consists of rolling a single die, and / consists of seeing what number 
we get, then £(/) = | + § + ! + | + § + § = 3.5. Similarly, the average family 
may be said to have 2.2 children .... 


Variance and standard deviation 

The expectation of a random variable is useful, but it can be misleading. 
Suppose the random variable / assigns income to an element of the sample 
space S, and S consists of 1000 supermarket cashiers and Bill Gates (or, indeed, 
1000 school teachers or university professors and Bill Gates); if all you knew 
was the average income, you might draw very erroneous conclusions. For a less 
extreme example, if a child’s weight is different from average, her parents may 
well want to know whether it falls within “normal” limits. The variance and 
the standard deviation address the question of how spread out a function is 
from its mean. 

Definition 4.2.6 (Variance). The variance of a random variable /, de- 
noted Var(/), is given by the formula 

Var (/) = £((/- E(f)f) =/(/(*)- E(f)) 2 \d k x\. 4.2.10 


Why the squared term in this formula? What we want to compute is how 
far / is, on average, from its average. But of course / will be less than the 
average just as much as it will be more than the average, so E(f - E(f)) is 
0. We could solve this problem by computing the mean absolute deviation, 
E\f — E(f) |. But this quantity is difficult to compute. In addition, squaring 
/ - E(f) emphasizes the deviations that are far from the mean (the income 
of Bill Gates, for example), so in some sense it gives a better picture of the 
“spread” than does the absolute mean deviation. 

But of course squaring f — E(f) results in the variance having different units 
than /. The standard deviation corrects for this: 
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Definition 4.2.7 (Standard deviation). The standard deviation of a 
random variable /, denoted <r(/), is given by 

*(/) = V / VaF(7). 4.2.11 


The name for the Greek letter 
a is “sigma.” 


The tneati absolute deviation 
for the “total obtained” random 
variable is approximately 1.94. sig- 
nificantly less than the standard 
deviation. Because of the square 
in the formula for the variance, the 
standard deviation weights more 
heavily values that are far from 
the expectation than those that 
are close, whereas the mean abso- 
lute deviation treats all deviations 
equally. 


Indeed, a very important ap- 
plication of probability theory is 
to determine whether phenomena 
are related or not. Is a person 
subjected to second-hand smoke 
more likely to get lung cancer than 
someone who is not? Is total fat 
consumption related to the inci- 
dence of heart disease? Does par- 
ticipating in Head Start increase 
the chances that a child from a 
poor family will graduate from 
high school? 


Example 4.2.8 (Variance and standard deviation). If the experiment is 
throwing two dice, and the random variable gives the total obtained, then the 
variance is 


^(2-n*+^(3-7)* + ... + | (7 - 7 )* + -..+J( 11 -7 ) * + i( 1 2- 7 )*.5m„ ; . 
and the standard deviation is \/5.833 . . . % 2.415. 


Probabilities and multiple integrals 

Earlier we discussed the functions fit, and fi w , the first giving probabilities for 
height of a 10-year-old girl chosen at random, the second giving probabilities 
for weight. Can these functions answer the question: what is the probability 
that a 10-ycar-old girl chosen at random will have height between 54 and 55 
inches, and weight between 70 and 71 pounds? The answer is no. Computing a 
“joint” probability as the product of “single” probabilities only works when the 
probabilities under study are indeperident. We certainly can’t expect weight to 
be independent of height. 

To construct a probability density function fi in two variables, height and 
weight, one needs more information than the information needed to construct 
fih and fi w separately. One can imagine collecting thousands of file cards, each 
one giving the height and weight of a 10-year-old girl, and distributing them 
over a big grid; the region of the grid corresponding to 54-55 inches tall, 74-75 
pounds, would have a very tall stack of cards, while the region corresponding 
to 50-51 inches and 100-101 pounds would have a much smaller stack; the 
region corresponding to 50-51 inches and 10-11 pounds would have none. The 
distribution of these cards corresponds to the density probability function fi. 
Its graph will probably look like a mountain, but with a ridge along some curve 
of the form w = c.h 3 , since roughly you would expect the weight to scale like 
the volume, which should be roughly proportional to the cube of the height. 

We can compute fi h and fi w from fi: 

Vh(h) = f fi ( h ) | dw\ 

VwM = f /' (£,) \dh\. 

J R 


4.2.13 
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You might expect that 
- [ hfih(fi) \dh\. 

This is true, and a form of Fubini's 
theorem to be developed in Sec- 
tion 4.5. If we have thrown away 
information about height- weight 
distribution, we can still figure out 
height expectation from the cards 
on the height-axis (and weight ex- 
pectation from the cards on the 
weight-axis). We’ve just lost all 
information about the correlation 
of height and weight. 


If two random variables are in- 
dependent, their covariance is 0, as 
is their correlation coefficient. The 
converse is not true. 

By “independent,” we mean 
that the corresponding probability 
measures are independent: if / is 
associated with ///, and g is asso- 
ciated with then / and g are 
independent, and fi h and g. u , are 
independent, if 

Hh(x)nw(y) - // 

where /x is the density probability 
function corresponding to the vari- 

abIe (y)- 



But the converse is not true. If we have our file cards neatly distributed over 
the height- weight grid, we could cut each file card in half and put the half giving 
height on the corresponding interval of the h - axis and the half giving weight 
on the corresponding interval of the u'-axis, which results in and (This 
corresponds to Equation 4.2.13). In the process we throw away information: 
from our stacks on the h and w axes we would not know how to distribute the 
cards on the height- weight grid. 1 

Computing the expectation for a random variable associated to the height- 
weight experiment requires a double integral. If you were interested in the 
average weight for 10-year-old girls whose height is close to average, you might 
compute the expectation of the random variable / satisfying 



if _ aa) < h < + 

otherwise. 


4.2.14 


The expectation of this function would be 


EU) = lv ldhdwl 4.2.15 

A double integral would also be necessary to compute the co variance of the 
random variables /// and fw. 


Definition 4.2.9 (Covariance). Let S\ be the sample space of one experi- 
ment, and S 2 be the sample space for another. If / : S\ -♦ M and g : S 2 -* R 
are random variables, their covariance, denoted Cov (/,</), is: 

Cov ( f,g ) = E((f- E(f))(g - E(g))j . 4.2.16 


The product (/ - E(f))(g - E(g)) is positive when both / and g are on the 
same side of their mean (both less than average, or both more than average), 
and negative when they are on opposite sides, so the covariance is positive when 
/ and g vary “together,” and negative when they vary “opposite.” 

Finally, we have the correlation coefficient of / and g: 


Definition 4.2.10 (Correlation coefficient). The correlation coefficient 
of two random variables / and g , denoted corr (f,g), is given by the formula 


corr (/, g) 


Cov (/> g) 

* 


4.2.17 


The correlation is always a number between -1 and 1, and has no units. 

J If and n„, were independent, then we could compute /x from n h and u w - in 
that case, we would have fi = g, h g, w . 
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Exercise 4.2.2 explores these 
analogies. 

In particular, “correlation 0” 
corresponds to “orthogonal.” 


The graph of the normal distri- 
bution Is a bell curve. 


As n grows, all the detail of the 
original experiment gets ironed 
out, leaving only the normal dis- 
tribution. 

The standard deviation of the 
new experiment (the “repeat the 
experiment n times and take the 
average" experiment) is the stan- 
dard deviation of the initial exper- 
iment divided by ^ n . 

Equation 4.2.20 puts the com- 
plication in the exponent; Equa- 
tion 4.2.21 puts it in the domain 
of integration. 

The exponent for e in Equation 
4.2.20 is hard to read; it is 


You should notice the similarities 

the length squared of vectors, 
the length of vectors, 
the dot product, 

the cosine of the angle 
between two vectors, 


between these definitions and 

analogous to the variance; 
analogous to the standard deviation; 
analogous to the covariance; 

analogous to the correlation. 


Central limit theorem 

One probability density is ubiquitous in probability theory: the normal distri- 
bution given by 

= -4= e ,J/2 . 4.2.18 

The object of this subsection is to explain why. 

The theorem that makes the normal distribution important is the central 
limit theorem . Suppose you have an experiment and a random variable, with 
expected value E and standard deviation a. Suppose that you repeat the exper- 
iment n times, with results xi, . . . ,x n . Then the central limit theorem asserts 
that the average 

x=i(xi + hx„) 4.2.19 

n 

is approximately distributed according to the normal distribution with mean E 
and standard deviation ojyfn , the approximation getting better and better as 
n — * oo. Whatever experiment you perform, if you repeat it and average, the 
normal distribution will describe the results. 

Below we will justify this statement in the case of coin tosses. First let us 
see how to translate the statement above into formulas. There are two ways 
of doing it. One is to say that the probability that x is between A and B is 
approximately 

/ VsWdx. 4.2.20 

V27T<7 J a 

We will use the other in our formal statement of the theorem. For this we make 
the change of variables A = E -f- a a/y/n, B = E + ob/y/n. 



n 


2 



There are a great many im- 
provements on and extensions of 
the central limit theorem; we can- 
not hope to touch upon them here. 


Theorem 4.2.11 (The central limit theorem). If an experiment and a 
random variable have expectation value E and standard deviation o, then 
if the experiment is repeated n times, with average result the probability 
that x is between E + and E -4- is approximately 


1 

v/2¥ 



4.2.21 
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We prove a special case of the central limit theorem in Appendix A. 12. The 
proof uses Stirling's formula , a very useful result showing how the factorial n! 
behaves as n becomes large. We recommend reading it if time permits, as it 
makes interesting use of some of the notions we have studied so far (Taylor 
polynomials and Riemann sums) as well as some you should remember from 
high school (logarithms and exponentials). 


Recall the binomial coefficient: 


(:)- 


n! 


k\{n - k)\ 


One way to get the answer in 
Equation 4.2.24 is to look up a ta- 
ble giving values for the “standard 
normal distribution function.” 

Another is to use some soft- 
ware. With Matlab, we use .5 
erf to get: 

EDU> a= . 5 *erf ( 20/sqrt ( 2000) ) 
EDU> a = 0.236455371567231 
EDU> b= ,5*erf(40/sqrt(2000)) 
EDU> b = 0.397048394633966 
EDU> b-a 

ans = 0. 160593023066735 

The “error function” erf is re- 
lated to the “standard normal dis- 
tribution function" as follows: 





Example 4.2.12 (Coin toss). As a first example, let us see how the central 
limit theorem answers the question: what is the probability that a fair coin 
tossed 1000 times will come up heads between 510 and 520 times? 

In principle, this is straightforward: just compute the sum 

Jik £ (T) 

fc=510 

In practice, computing these numbers would be extremely cumbersome; it is 
much easier to use the central limit theorem. Our individual experiment consists 
of throwing a coin, and our random variable returns 1 for “heads” and 0 for 
“tails.” This random variable has expectation E = .5 and standard deviation 
o = .5 also, and we are interested in the probability of the average being between 
.51 and .52. Using the version of the central limit theorem in Equation 4.2.20, 
we see that the probability is approximately 


Now we set 


2y/lQOO T 5 ' 2 

J 51 



1000 


(t 5 )’ ■ - 


so that 2\/lb()0 dx = dt. 


Substituting t 2 and dt in Equation 4.2.23 we get 


/ 27T Jo\ 


40/VT000 


, e - * 2 / 2 dt « 0.1606. 

v27r J 20/V1060 

Does this seem large to you? It does to most people. A 


4.2.23 


4.2.24 


Computations like this are used 
everywhere: when drug companies 
figure out how large a population 
to try out a new drug on, when 
industries figure out how long a 
product can be expected to last, 
etc. 


Example 4.2.13 (Political poll). How many people need to be polled to call 
an election, with a probability of 95% of being within 1% of the “true value”? 
A mathematical model of this is tossing a biased coin, which falls heads with 
unknown probability p and tails with probability 1 - p. If we toss this coin 
n times (i.e., sample n people) and return 1 for heads and 0 for tails (1 for 
candidate A and 0 for candidate B), the question is: how large does n need to 
be in order to achieve 95% probability that the average we get is within 1% of 

p? 
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Figure 4.2.3 gives three typical 
values for the area under the bell 
curve; these values are useful to 
know. For other values, you need 
to use a table or software, as de- 
scribed in Example 4.2.12. 


You need to know something about the bell curve to answer this, namely 
that 95% of the mass is within one standard deviation of the mean (and it is 
a good idea to memorize Figure 4.2.3). That means that we want 1% to be 
the standard deviation of the experiment of asking n people. The experiment 
of asking one person has standard deviation a ~ \/p (l ~p)- Of course, p is 
what we don’t know, but the maximum of yjp{ 1 — p) is 1/2 (which occurs for 
p = 1/2). So we will be safe if we choose n so that the standard deviation a / y/n 

is 


1 1 
2 y/n ~ 100 1 


i.e. n = 2500. 


4.2.25 


How many would you need to ask if you wanted to be 95% sure to be within 
2% of the true value? Check below. 2 A 


.501 



Figure 4.2.3. For the normal distribution, 68 percent of the probability is within 
one standard deviation; 95 percent is within two standard deviations; 99 percent is 
within 2.5 standard deviations. 


4.3 What Functions Can Be Integrated? 

What functions are integrable? It would be fairly easy to build up a fair 
collection by ad hoc arguments, but instead we prove in this section three the- 
orems answering that question. They will tell us what functions are integrable, 
and in particular will guarantee that all usual functions are. 

The first is based on our notion of dyadic pavings. The second states that 
any continuous function on M n with bounded support is integrable. The third 
is stronger than the second; it tells us that a function with bounded support 
does not have to be continuous everywhere to be integrable; it is enough to 
require that it be continuous except on a set of volume 0. 

This third criterion is adequate for most functions that you will meet. How- 
ever, it is not the strongest possible statement. In the optional Section 4.4 we 
prove a harder result: a function / : K n — ♦ 1R, bounded and with bounded 
support, is integrable if and only if it is continuous except on a set of measure 
0. The notion of measure 0 is rather subtle and surprising; with this notion, 

2 The number is 625. Note that gaining 1% quadrupled the price of the poll. 



This follows the rule that there 
is no free lunch: we don’t work 
very hard, so we don’t get much 
for our work. 


Recall that Vn denotes the col- 
lection of all cubes at a single 
level JV, and that osc c(f) denotes 
the oscillation of / over C: the 
difference between its least upper 
bound and greatest lower bound, 
over C. 

Epsilon has the units of vol n . If 
n = 2, epsilon is measured in cen- 
timeters (or meters . . . ) squared; 
if n = 3 it is measured in centime- 
ters (or whatever) cubed. 


c 



Figure 4.3.1. 

The graph of the characteristic 
function of the unit disk, Xd- 
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we see that some very strange functions are integrable. Such functions actually 
arise in statistical mechanics. 

First, the theorem based on dyadic pavings. Although the index under the 
sum sign may look unfriendly, the proof is reasonably easy, which doesn’t mean 
that the criterion for integrability that it gives is easy to verify in practice. We 
don’t want to suggest that this theorem is not useful; on the contrary, it is the 
foundation of the whole subject. But if you want to use it directly, proving 
that your function satisfies the hypotheses is usually a difficult theorem in its 
own right. The other theorems state that entire classes of functions satisfy the 
hypotheses, so that verifying integrability becomes a matter of seeing whether 
a function belongs to a particular class. 

Theorem 4.3.1 (Criterion for integrability). A function f : R n K, 
bounded and with bounded support, is integrable if and only if for all e > 0, 
there exists N such that 

volume of all cubes for which 
the oscillation of / over the cube is >e 

/ " A " ■■ — 

Yl vol n C <e. 4.3.1 

{C€X>at| osc <?(/)>«} 


In Equation 4.3.1 we sum the volume of only those cubes for which the oscil- 
lation of the function is more than epsilon. If, by making the cubes very small 
(choosing N sufficiently large) the sum of their volumes is less than epsilon, 
then the function is integrable: we can make the difference between the upper 
sum and the lower sum arbitrarily small; the two have a common limit. (The 
other cubes, with small oscillation, contribute arbitrarily little to the difference 
between the upper and the lower sum.) 

You may object that there will be a whole lot of cubes, so how can their 
volume be less than epsilon? The point is that as N gets bigger, there are more 
and more cubes, but they are smaller and smaller, and (if / is integrable) the 
total volume of those where osc c > e tends to 0. 

Example 4.3.2 (Integrable functions). Consider the characteristic func- 
tion Xd that is 1 on a disk and 0 outside, shown in Figure 4.3.1. Cubes C 
that are completely inside or completely outside the disk have osc c (Xd) = 0. 
Cubes straddling the border have oscillation equal to 1. (Actually, these cubes 
are squares, since n = 2.) By choosing N sufficiently large (i.e., by making 
the squares small enough), you can make the area of those that straddle the 
boundary arbitrarily small. Therefore Xp is integrable. 

Of course, when we make the squares small, we need more of them to cover 
the border, so that the sum of areas won’t necessarily be less than e. But 
as we divide the original border squares into smaller ones, some of them no 
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Figure 4.3.2. 


The function sin A is integrable 
over any bounded interval. The 
dyadic intervals sufficiently near 0 
will always have oscillation 2, but 
they have small length when the 
dyadic paving is fine. 

VhU) - 

The center region of Figure 
4.3.2 is black because there are in- 
finitely many oscillations in that 
region. 


Note again the surprising but 
absolutely standard way in which 
we prove that something (here, the 
difference between upper and low- 
er sums) is zero: we prove that it 
is smaller than an arbitrary c > 0. 
(Or equivalently, that it is smaller 
than u(c), when u is a function 
such that u(c) — ♦ 0 as c -♦ 0. 
Theorem 1.5.10 states that these 
conditions are equivalent.) 


longer straddle the border. This is not quite a proof; it is intended to help you 
understand the meaning of the statement of Theorem 4.3.1. 

Figure 4.3.2 shows another integrable function, sin Near 0, we see that a 
small change in x produces a big change in f(x), leading to a large oscillation. 
But we can still make the difference between upper and lower sums arbitrarily 
small by choosing N sufficiently large, and thus the intervals sufficiently small. 
Theorem 4.3.10 justifies our statement that this function is integrable. 

Example 4.3.3 (A nonintegrable function). The function that is 1 at 
rational numbers in [0, 1] and 0 elsewhere is not integrable. No matter how 
small you make the cubes (intervals in this case), choosing N larger and larger, 
each cube will still contain both rational and irrational numbers, and will have 
osc si. A 

Proof of Theorem 4.3.1. First we will prove that the existence of such an N 
implies integrability: i.e., that the lower sum Un{/) and the upper sum L^(f) 
converge to a common limit. Choose any c > 0, and let N satisfy Equation 
4.3.1. Then 

contribution from cubes with osc>c contribution from cubes with osc<c 

Lfi(f) < 2 sup |/| vol n C+ ^2 cvol n C 

{C€P.v| oscc(/)>e} (CeX>Af io.o c (/)<* 

and CnSupp(/)^ <£>} 

< <(2sup|/i + VOl n Csupp). 

4.3.2 


where sup|/| is the supremum of |/|, and Cs U pp is a cube that contains the 
support of / (see Definition 4.1.2). 

The first sum on the right-hand side of Equation 4.3.2 concerns only those 
cubes for which osc > c. Each such cube contributes at most 2 sup |/| vol n C to 
the maximum difference between upper and lower sums. (It is 2sup|/| rather 
than sup | /| because the value of / over a single cube might swing from a 
positive number to a negative one. We could also express this difference as 
sup / - inf /.) 

The second sum concerns the cubes for which osc < c. We must specify that 
we count only those cubes for which / has, at least somewhere in the cube, a 
nonzero value; that is why we say {C | C n Supp(/) ^ <t>}- Since by definition 
the oscillation for each of those cubes is at most c, each contributes at most 
c vol n C to the difference between upper and lower sums. 

We have assumed that it is possible to choose N such that the cubes for 
which osc > c have toted volume less than e, so we replace the first sum by 
2c sup |/| . Factoring out c, we see that by choosing JV sufficiently large, the 
upper and lower sums can be made arbitrarily close. Therefore, the function is 
integrable. This takes care of the “if” part of Theorem 4.3.1. 
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To review how to negate state- 
ments, see Section 0.2. 


For the “only if” part we must prove that if the function is integrable, then 
there exists an appropriate N. Suppose not. Then there exists one epsilon, 
eo > 0, such that for all N we have 

vol n C>€ 0 . 4.3.3 

{C€T>n I oscc(/)Xo> 

Now for any N we will have 


You might object that in Equa- 
tion 4.3.2 we argued that by mak- 
ing the e in the last line small, 
we could get the upper and lower 
sums to converge to a common 
limit. Now in Equation 4.3.4 we 
argue that the el in the last line 
means the sums don’t converge; 
yet the square of a small number 
is smaller yet. The crucial differ- 
ence is that Equation 4.3.4 con- 
cerns one particular eo > 0, which 
is fixed and won’t get any smaller, 
while Equation 4.3.2 concerns any 
€ > 0, which we can choose arbi- 
trarily small. 


U N (f)-L N (f)= 5Z osc c(f) vol n C 

CgT>n 

>eo >co 4.3.4 

> ^ oscc(/)vol n C > ej). 

{Cev N \ o8c c'(/)> e o } 


The sum of vol „ C is at least €o> by Equation 4.3.3, so the upper and the 
lower integrals will differ by at least and will not tend to a common limit. 
But we started with the assumption that the function is integrable. □ 


Theorem 4.3.1 has several important corollaries. Sometimes it is easier to 
deal with non-negative functions than with functions that can take on both 
positive and negative values; Corollary 4.3.5 shows how to deal with this. 


Definition 4.3.4 (/ + and /“). If / : R n — ► R is any function, then set 



if /(x) > 0 
if /(x) < 0 


and 



if /(x) < 0 
if /(x) > 0. 


Clearly both f + and / are non-negative functions, and / = /+ - /“ . 

Corollary 4.3.5. A bounded function with bounded support f is integrable 
if and only if both f + and f~ are integrable. 

Proof. If / + and f~ axe both integrable, then so is / by Proposition 4.1.11. 
For the converse, suppose that / is integrable. Consider a dyadic cube C € 
2>Ar(R n ). If / is non-negative on C, then osc c(f) = osc c(/ + ) and osc c(f~) = 
0. Similarly, if / is non-positive on C, then oscc(f) = osc c(f~) and osc c(f + ) = 
0. Finally, if / takes both positive and negative values, then osc c(/ + ) < 
osc c(/), osc c (/~) < osc c(/). Then Theorem 4.3.1 says that both /+ and /“ 
are integrable. □ 

Proposition 4.3.6 tells us why the characteristic function of the disk discussed 
in Example 4.3.2 is integrable. We argued in that example that we can make 
the area of cubes straddling the boundary arbitrarily small. Now we justify that 
argument. The boundary of the disk is the union of two graphs of functions; 
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Proposition 4.3.6 says that any bounded part of the graph of an mtegrable 
function has volume 0. 3 


The graph of a function / : 
j£ r< — » K is n-dimensional but it 
lives in P/ ,+l f just as the graph 
of a function / : iR — ♦ is a 

curve drawn in the (.r. y)-plane. 
The graph P(/) can't intersect the 
cube Co because V(f) is in .r: , ‘ 4 ' 1 
and Co is in l? u . We have to add 
a dimension by using Co x ir*.. 



i! 

4 


y 

Figure 4.3.3. 

The graph of a function from 
IR — * *R. Over the interval A. 
the function has osc < c; over 
the interval B s it has osc > e. 
Above A , we keep the two cubes 
that intersect the graph: above B, 
we keep the entire tower of cubes, 
including the basement. 


Proposition 4.3.6 (Bounded part of graph has volume 0). Let f : 
T&n _ r be an integrable function with graph F(f), and let Co C K n be any 
dyadic cube. Then 

vol„+i ( r (/) n (Co X R) ) = 0 4.3.5 

N V 

bounded part of graph 


Proof. The proof is not so very hard, but we have two types of dyadic cubes 
that we need to keep straight: the (n + l)-dimensional cubes that intersect the 
graph of the function, and the n-dimensional cubes over which the function itself 
is evaluated. Figure 4.3.3 illustrates the proof with the graph of a function from 
1P> ; in that figure, the rr-axis plays the role of IR n in the theorem, and the 
(x,y)-p\ane plays the role of R n+I . In this case we have squares that intersect 
the graph, and intervals over which the function is evaluated. In keeping with 
that figure, let us denote the cubes in ]R n+l by S (for squares) and the cubes 
in JR” by I (for intervals). 

We need to show that the total volume of the cubes S € Z>w(M n+1 ) that 
intersect T(f) fl (C 0 x IR) is small when N is large. Let us choose c, and N 
satisfying the requirement of Equation 4.3.1 for that c: we decompose Co into 
n-dimensional cubes I small enough so that the total n-dimensional volume of 
the cubes over which osc (/) > c is less than c. 

Now we count the (n + l)-dimensional cubes S that intersect the graph. 
There are two kinds of these: those whose projection on P. n are cubes I with 
osc (/) > e, and the others. In Figure 4.3.3, B is an example of an interval with 
osc (/) > c. while A is an example of an interval with osc(f) < c. 

For the first sort (large oscillation), think of each n-dimensional cube 7 over 
which osc(f) > € as the ground floor of a tower of (n + l)-dimensional cubes S 
that is at most sup |/| high and goes down (into the basement) at most - sup |/|. 
To be sure we have enough, we add an extra cube 5 at top and bottom. Each 


3 It would be simpler if we could just write vol n+ i^r(/)J = 0. The problem is 

that our definition for integrability requires that an integrable function have bounded 
support. Although the function is bounded with bounded support, it is defined on 
all of M n . So even though it has value 0 outside of some fixed big cube, its graph 
still exists outside the fixed cube, and the characteristic function of its graph does 
not have bounded support. We fix this problem by speaking of the volume of the 
intersection of the graph with the (to + l)-dimensional bounded region Co x IR. You 
should imagine that Co is big enough to contain the support of /, though the proof 
works in any case. In Section 4.11, where we define integrability of functions that are 
not bounded with bounded support, we will be able to say (Corollary 4.11.8) that a 
graph has volume 0. 
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In Equation 4.3.7 we are count- 
ing more cubes than necessary: we 
are using the entire n-dimensional 
volume of Cn, rather than sub- 
tracting the parts over which 
osc(/) > e. 


In Section 4.11 (Proposition 
4.11.7) we will be able to drop the 
requirement that. A/ be compact. 


tower then contains 2(sup |/| + 1) • 2 V such cubes. (We multiply by 2* because 
that is the inverse of the height of a cube S. At N = 0, the height of a cube 
is 1; at N = 2, the height is 1/2. so we need twice as many cubes to make the 
same height tower.) You will see from Figure 4.3.3 that we are counting more 
squares than we actually need. 

How many such towers of cubes will we need? We chose N large enough so 
that the total n-dimensional volume of all cubes / with ose > c is less than c. 
The inverse of the volume of a cube / is 2 nN , so there are 2 nA c intervals for 
which we need towers. So to cover the region of large oscillation, we need in all 



no. of cubes / 
with osc>< 


2(sup|/| + l)2 w 
S — — ' 

no. of cubes S 
for one / with osc {f)>( 


4.3.6 


(n 4- l)-dimensional cubes 5. 

For the second sort (small oscillation), for each cube / we require at most 
2 /v c -f- 2 cubes S , giving in all 

2 nW vol„(C (l ) (2*e + 2) 4.3.7 

no. of cubes / no. of cubes S 

to cover Co for one / with aac(f)<r 

Adding these numbers, we find that the bounded part of the graph is covered 
by 

2 (n+i)jV ^2f(sup|/| + 1) + (c + vol„(CbA cubes 5. 4.3.8 

This is of course an enormous number, but recall that each cube has (n4 1)- 
dimensional volume l/2^ n+1)N , S o the total volume is 

2e(sup |/| + 1) + (e + Jj-) vol„(C 0 ), 4.3.9 

which can be made arbitrarily small. □ 


As you would expect, a curve in the plane has area 0, a surface in IR 3 has 
volume 0, and so on. Below we must stipulate that such manifolds be compact, 
since we have defined volume only for bounded subsets of IR 7 *. 


Proposition 4.3.7. IfM c is a manifold embedded in R n , of dimension 
k <n, then any compact subset X C M satisfies vol n (A") = 0. In particular . 
any bounded part of a subspace of dimension k < n has n-dimensional volume 
0 . 


Proof. We can choose for each x € X a neighborhood U C IR n of x such 
that M n U is a graph of a function expressing n - k coordinates in terms of 
the other k . Since X is compact, a finite number of these neighborhoods cover 
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The equation g(x) = 0 that de- 
fines M H U is n - k equations in 
n unknowns. Any point x satisfy- 
ing g(x) — 0 necessarily satisfies 
gx (x) = 0, so M n U c Mi . 

The second part of the proof is 
just spelling out the obvious fact 
that since (for example) a surface 
in M 3 has three-dimensional vol- 
ume 0, so does a curve on that sur- 
face. 


The terms compact support and 
bounded support mean the same 
thing. 

Our proof of Theorem 4.3.8 ac- 
tually proves a famous and much 
stronger theorem: every continu- 
ous function with bounded sup- 
port is uniformly continuous (see 
Section 0.2 for a discussion of uni- 
form continuity). 

This is stronger than Theorem 
4.3.8 because it shows that the os- 
cillation of a continuous function 
is small everywhere, whereas inte- 
grability requires only that it be 
small except on a small set. (For 
example, the characteristic func- 
tion of the disk in Example 4.3.2 
is integrable, although the oscilla- 
tion is not small on the cubes that 
straddle the boundary.) 


X, so it is enough to prove vol n (M O U) = 0 for such a neighborhood. In the 
case k - n - 1. this follows from Proposition 4.3.6. Otherwise, there exists a 

r si l 


mapping g = 


: U -* such that MDU is defined by the equation 


|_0n-* J 

g(x) = 0, and such that [Dg(x)j is onto R n ~ fc for every x 6 M r\U. Then the 
locus M\ given by just the first of these equations, <?i(x) = 0, is a manifold of 
dimension n - 1 embedded in U, so it has n-dimensional volume 0, and since 
M C\U C Mi, we also have vol n (A/ ft U) = 0. □ 


What functions satisfy the hypothesis of Theorem 4.3.1? One important 
class is the class of continuous functions with bounded support. To prove that 
such functions are integrable we will need a result from topology — Theorem 
1.6.2 about convergent subsequences. 

Theorem 4.3.8. Any continuous function on tt n with bounded support is 
integrable. 


Our previous criterion for integrability, Theorem 4.3.1, defines integrability in 
terms of dyadic decompositions. It might appear that whether or not a function 
is integrable could depend on where the function fits on the grid of dyadic cubes; 
if you nudge the function a bit, might you get different results? Theorem 4.3.8 
says nothing about dyadic decompositions, so we see that integrability does not 
depend on how the function is nudged; in mathematical language, integrability 
is translation invariant. 

Proof. Suppose the theorem is false; then there certainly exists an eo > 0 such 
that for every N, the total volume of all cubes Cn 6 XV with osc > eo is at 
least eo- In particular, a cube Cn 6 ZV must exist such that osc c(f) > to* We 
can restate this in terms of distance between points: (V contains two points 
Xiv.yy such that 

l/( x /v) ~ /(yv)l > eo- 4.3.10 

These points are in the support of /, so they form two bounded sequences: 
the infinite sequence composed of the points x/y for all N, and the infinite 
sequence composed of the points y/y for all N. By Theorem 1.6.2 we can extract 
a convergent subsequence x^ that converges to some point a. By Equation 
4.1.17, 

I**.- -y*,i ^ 

so we see that y^ also converges to a. 


4.3.11 
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Figure 4.3.4. 


The black curve represents A; 
the darkly shaded region consists 
of the cubes at some level that 
intersect A. The lightly shaded 
region consists of the cubes at the 
same depth that border at least 
one of the previous cubes. 



Figure 4.3.5. 


In 1R 2 ,3 2 -1=8 cubes are 
enough to completely surround a 
cube Ci. In IR 3 ,3 3 — I = 26 cubes 
are enough to completely surround 
a cube Ci. If we include the cube 
Ci, then 3 2 cubes are enough in 
K 2 , and 3 3 in IR 3 . 


Since / is continuous at a, then for any e there exists 6 such that if |x — a| < S 
then |/(x) - /(a)| < e; in particular we can choose e = e«/4, so |/(x) - /(a)| < 

fo/4. 

For N sufficiently large, |x N| . - a| < S and |y A r. - a| < S. Thus (using the 
triangle inequality, Theorem 1.4.9), 


distance as crow flics crow takes scenic route 

Co < I /(x*i) - /(yw,)|' < 1/(x Wj ) - /(a)l + |/(a) - f{y Ni )i < % ‘1-3.12 

Equation 4.3.10 

But fo < co/2 is false, so our hypothesis is faulty: / is integrable. □ 

Corollary 4.3.9. Any bounded part of the graph of a continuous function 
has volume 0. 

A function need not be continuous everywhere to be integrable. as our third 
theorem shows. This theorem is much harder to prove than the first two, bin 
the criterion for integrability is much more useful. 

Theorem 4.3.10. A function f :W l —*■ D%, bounded with bounded support , 
is integrable if it is continuous except on a set of volume 0. 

Note that Theorem 4.3.10, like Theorem 4.3.8 but unlike Theorem 4.3.1, is 
not an “if and only if” statement. As will be seen in the optional Section 4.4, it 
is possible to find functions that are discontinuous at all the ratiouals. yet still 
are integrable. 

Proof. Denote by A (“delta”) the set of points where / is discontinuous: 

A = {x € IR n | / is not continuous at x} . 4.3.13 

Choose some e > 0. Since / is continuous except on a set of volume 0, we 
have vol„ A = 0. So (by Definition 4.1.18) there exists N and some finite union 
of cubes Ci, . . . , Ck G Vn (R n ) such that 

k 

A C Ci U • • • U Ck and 4.3. 1-1 

1=1 

Now we create a “buffer zone” around the discontinuities: let L be the union 
of the Ci and all the surrounding cubes at level N , as shown in Figure 4.3.4. As 
illustrated by Figure 4.3.5, we can completely surround each C,, using 3" - 1 
cubes (3 n including itself). Since the total volume of all the C, is less than 
c/3”, 


vol„(L) < t. 


4.3.15 
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Moreover, since the length of a side of a cube is 1 / 2 iV , every point of K n - L 
is at least \/2 N away from A. 

All that remains is to show that there exists M > N such that if C € T>m 
and C <£. L, then osc c(f) < c. If we can do that, we will have shown that a 
decomposition exists at which the total volume of all cubes over which osc (/) > 
e is less than e, which is the criterion for integrability given by Theorem 4.3.1. 

Suppose no such M exists. Then for every M > N , there is a cube C €T>m 
and points x M ,y M € C with |/(x M ) - /(y m )\ > c. 

The xm are a bounded sequence in R n , so we can extract a subsequence x^ i 
that converges to some point a. Since (again using the triangle inequality) 

I /(xa*) - /(a)| + |/(a) - f(y Mi )\ > \f(x Ml ) - /( y Mi )\ > c, 4.3.16 

we see that at least one of |/(xA/ j )-/(a)| and |/(yM,)-/(a)| does not converge 
to 0, so / is not continuous at a, i.e., a € A. But this contradicts the fact that 
a is a limit of points outside of L. Since all x Mi are at least 1/2 N away from 
points of A, a is also at least \/2 N away from points of A. □ 


Corollary 4.3.12 says that vir- 
tually all examples that occur in 
“vector calculus” examples are in- 
tegrate. 


Corollary 4.3.11. If f is an integrable function on R n , and g is another 
bounded function such that f = g except on a set of volume 0 , then g is 
integrable, and 

[ /I^x|=/ g\<Tx\. 4.3.17 

J* n J R* 


Corollary 4.3.12. Let A C IR n be a region bounded by a finite union of 
graphs of continuous functions , and let f : A R be continuous. Then the 
function f : R n — * R that is f(x) for x 6 A and 0 outside A is integrable. 


Exercise 4.3.1 asks you to give 
an explicit bound for the num- 
ber of cubes of Z>n(R 2 ) needed to 
cover the unit circle. 


In particular, the characteristic function of the disk is integrable, since the 
disk is bounded by the graphs of the two functions 

y = +\/x 2 - 1 and y = -y/x 2 - 1. 4.3.18 


4.4 Integration and Measure Zero (optional) 

There is measure in all things . — Horace 

We mentioned in Section 4.3 that the criterion for integrability given by 
Theorem 4.3.10 is not sharp. It is not necessary that a function (bounded 
and with bounded support) be continuous except on a set of volume 0 to be 
integrable: it is sufficient that it be continuous except on a set of measure 0. 
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The measure theory approach 
to integration. Lebeatjut- nitty ra- 
tion, is superior to Riomaim in- 
tegration from several points of 
view. It makes it possible to inte- 
grate otherwise unintegrable func- 
tions. and it is better behaved 
than Riemann integration with re- 
spect to limits / = lim/„. How- 
ever. the theory takes much longer 
to develop and is poorly adapted 
to computation. For the kinds 
of problems treated in this book. 
Riemann integration is adequate. 

Our boxes B, are open rubes. 
But the theory applies to boxes 
B t that have other shapes: Defini- 
tion 4.4.1 works with the B, us ar- 
bitrary sets with well-defined vol- 
ume. Exercise 4.4.2 asks von to 
show that you can use balls, and 
Exercise 4.4.3 asks you to show 
that you can use arbitrary pavable 
sets. 



Figure 4.4. 1 . 


The set A', shown as a heavy 
line, is covered by boxes that over- 
lap. 


We say that the stun of the 
lengths is less than t because some 
of the intervals overlap. 

The set U, is interesting in its 
own right; Exercise A IS. I explores 
some of its bizarre properties. 


Measure theory is a big topic, beyond the scope of this book. Fortunately, 
the notion of meosuve 0 is much more accessible. Measure 0 is a subtle notion 
with some bizarre consequences; it gives us a wav, for example, of saying that the 
rational numbers ‘’don't count. Tims it. allows ns to iisp Riemaim integration to 
integrate some quite interesting functions, including one we explore in Example 
4.4.3 as a reasonable model for space averages in statistical mechanics. 

In the definition below, a box B in IP/* of side 6 > 0 will be a cube of the 
form 

{ x € | a, < x, < a, 4- S, i — 1 n } . 4.4.1 

There is no requirement that the a, or 6 be dyadic. 

Definition 4.4.1 (Measure 0). A set X € R" has measure 0 if and only 
if for every e > 0, there exists an infinite sequence of open boxes B{ such that 

X euBi and ^vol f ,(#j) < e. 4.4.2 


That is, the set can be contained in a possibly infinite sequence of boxes 
(intervals in !?., squares in £: 2 . . . . ) whose total volume is < epsilon. The crucial 
difference between measure and volume is the word infinite in Definition 4.4.1. 
A set with volume 0 can be contained in a finite sequence of cubes whose total 
volume is arbit rary small. A set with volume 0 necessarily has measure 0, but 
it is possible for a set to have measure 0 but not to have a defined volume, as 
shown in Example 4.4.2. 

We speak of boxes rather than cubes to avoid confusion with the cubes of our 
dyadic pavings. In dyadic pavings, we considered ’‘families 1 ' of cubes all of the 
same size; the cubes at. a particular resolution N, and fitting the dyadic grid. 
The boxes B, of Definition 4.4.1 get small as i increases, since their total volume 
is less than e . but it is not necessarily the case that any particular box is smaller 
than the one immediately preceding it. The boxes can overlap, as illustrated in 
Figure 4.4.1. and they are not required to square with any particular grid. 

Finally, you may have noticed that the boxes in Definition 4.4.2 are open, 
while the dyadic cubes of our paving are semi-open. In both cases, this is just 
for convenience: tiie theory could be built just as well with closed cubes and 
boxes (see Exorcise 4.4.1). 

Example 4.4.2 (A set with measure 0, undefined volume). The set 
of rational numbers in the interval [0, lj lias measure 0. You can list, them 

in order 1. 1/2. 1/3. 2/3. 1/4, 2/4, 3/4. 1/5 (The list is infinite and includes 

some numbers more than once.) Center an open interval of length e/2 at 1, an 
open interval of length e/4 at 1/2. an open interval of length e/8 at 1/3, and 
so on. Call U, the union of these intervals. The sum of the lengths of these 
intervals (i.e.. £ volj) will be less than e (1/2 + 1/4 -f 1/8 + ...) = e. 
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The set of Example 4.4.2 is a 
good one to keep in mind while 
trying to picture the boxes B x , 
because it helps us to see that 
while the sequence B x is made of 
B\, B 2 , • • • i in order, these boxes 
may skip around. The “boxes” 
here are the intervals; if B i is cen- 
tered at 1/2, then B 2 is centered 
at 1/3, B 3 at 2/3, B 4 at 1/4, and 
so on. We also see that some boxes 
may be contained in others: for 
example, depending on the choice 
of e, the interval centered at 17/32 
may be contained in the interval 
centered at 1/2. 


Statistical mechanics is an at- 
tempt to apply probability theory 
to large systems of particles, to 
estimate average quantities, like 
temperature, pressure, etc., from 
the laws of mechanics. Thermo- 
dynamics, on the other hand, is 
a completely macroscopic theory, 
trying to relate the same macro- 
scopic quantities (temperature, 
pressure, etc.) on a phenomeno- 
logical level. Clearly, one hopes to 
explain thermodynamics by statis- 
tical mechanics. 


You can place all the rationals in [0, l] in intervals that are infinite in number 
but whose total length is arbitrarily small! The set thus has measure 0. However 
it does not have a defined volume: if you were to try to measure the volume, 
you would fail because you could never divide the interval [0, l] into intervals 
so small that they contain only rational numbers. 

We already ran across this set in Example 4.3.3, when we found that we could 
not integrate the function that is 1 at rational numbers in the interval (0, 1] and 
0 elsewhere. This function is discontinuous everywhere; in every interval, no 
matter how small, it jumps from 0 to 1 and from 1 to 0. A 

In Example 4.4.3 we see a function that looks similar but is very different. 
This function is continuous except over a set of measure 0, and thus is inte- 
grate. It arises in real life (statistical mechanics, at least). 

Example 4.4.3 (An integrable function with discontinuities on a set 
of measure 0). The function 

( i if x = 5 is rational, Ixl < 1 and written in lowest terms 
f{x) = \ q q ' ' “ 4.4.3 

l 0 if a: is irrational, or ]x| > 1 

is integrable. The function is discontinuous at values of x for which f(x) ^ 0. 
For instance, /( 3/4) = 1/4, while arbitrarily close to 3/4 we have irrational 
numbers such that f{x) = 0. But such values form a set of measure 0. The 
function is continuous at the irrationals: arbitrarily close to any irrational num- 
ber x you will find rational numbers p/q , but you can choose a neighborhood 
of x that includes only rational numbers with arbitrarily large denominators q , 
so that f{y) will be arbitrarily small. A 

The function of Example 4.4.3 is important because it is a model for functions 
that show up in an essential way in statistical mechanics (unlike the function 
of Example 4.4.2, which, as far as we know, is only a pathological example, 
devised to test the limits of mathematical statements). 

In statistical mechanics, one tries to describe a system, typically a gas en- 
closed in a box, made up of perhaps 10 25 molecules. Quantities of interest might 
be temperature, pressure, concentrations of various chemical compounds, etc. 

A state of the system is a specification of the position and velocity of each 
molecule (and rotational velocity, vibrational energy, etc., if the molecules have 
inner structure); to encode this information one might use a point in some 
gadzillion dimensional space. 

Mechanics tells us that at the beginning of our experiment, the system is in 
some state that evolves according to the laws of physics, “exploring” as time 
proceeds some part of the total state space (and exploring it quite fast relative to 
our time scale: particles in a gas at room temperature typically travel at several 
hundred meters per second, and undergo millions of collisions per second.) 
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We discussed Example 4.4.3 to 
show that such bizarre functions 
can have physical meaning. How- 
ever, we do not mean to suggest, 
that because the rational numbers 
have measure 0, trajectories with 
rational slopes are never impor- 
tant for understanding the evolu- 
tion of dynamical systems. On the 
contrary: questions of rational vs. 
irrational numhers are central to 
understanding the intricate inter- 
play of chaotic and stable behav- 
ior exhibited, for example, by the 
lakes of Wada. (For more on this 
topic, see J.H. Hubbard, What it 
Means to Understand a Differen- 
tial Equation, The College Mathe- 
matics Journal, Vol. 25, (Nov. 5. 
1994), 372-384.) 


The guess underlying thermodynamics is that the quantity one measures, 
which is really a time average of the quantity as measured along the trajectory 
of the system, should be nearly equal in the long run to the average over all 
possible states, called the space average. (Of course the ‘‘long run is cjuite a 
short run by our clocks.) 

This equality of time averages and space averages is called Boltzmann's er- 
godic hypothesis. There aren’t many mechanical systems where it is mathemat- 
ically proved to be true, but physicists believe that it holds in great generality, 
and it is the key hypothesis that connects statistical mechanics to thermody- 
namics. 

Now what does this have to do with our function / above? Even if you believe 
that a generic time evolution will explore state space fairly evenly, there will 
always be some trajectories that don’t. Consider the (considerably simplified) 
model of a single particle, moving without friction on a square billiard table, 
with ordinary bouncing when it hits an edge (the angle of incidence equal to the 
angle of reflection). Then most trajectories will evenly fill up the table, in fact 
precisely those that start with irrational slope. But those with rational slopes 
emphatically will not: they will form closed trajectories, which will go over and 
over the same closed path. Still, as shown in Figure 4.4.2, these closed paths 
will visit more and more of the table as the denominator of the slope becomes 
large. 





FIGURE 4.4.2. The trajectory with slope 2/5, at center, visits more of the square 
than the trajectory with slope 1/2, at left. The slope of the trajectory at right closely 
approximates an irrational number; if allowed to continue, this trajectory would visit 
every part of the square. 


Suppose further that the quantity to be observed is some function / on the 
table with average 0, which is positive near the center and very negative near 
the corners. Moreover, suppose we start our particle at the center of the table 
but don't specify its direction. This is some caricature of reality, where in the 
laboratory we set up the system in some macroscopic configuration, like having 
one gas in half a box and another in another half, and remove the partition. 
This corresponds to knowing something about the initial state, but is a very 
long way from knowing it exactly. 



Theorem 4.4.4 is stronger than 
Theorem 4.3.10, since any set of 
volume 0 also has measure 0, and 
not conversely. It is also stronger 
than Theorem 4.3.8. 

But it is not stronger than The- 
orem 4.3.1. Theorems 4.4.4 and 
4.3.1 both give an “if and only if’ 
condition for iutegrahility; they 
are exactly equivalent. But it is of- 
ten easier to verify that a function 
is integrahle using Theorem 4.4.4. 
It also makes it clear that whether 
or not a function is integrahle does 
not depend on where a function is 
placed on some arhitrary grid. 

We prune the list of hoxes by 
throwing away any box that is con- 
tained in an earlier one. We could 
prove our result without pruning 
the list, but it would make the ar- 
gument more cuinhersome. 

Recall (Definition 4.1.4) that 
oscj 9 t (/) is the oscillation of / over 
B t : the difference between the 
least upper hound of / over B, and 
the greatest lower hound of / over 
Bi. 
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Trajectories through the center of the table, and with slope 0, will have 
positive time averages, as will trajectories with slope oo. Similarly, we believe 
that the average, over time, of each trajectory with rational slope will also be 
positive: the trajectory will miss the corners. But trajectories with irrational 
slope will have 0 time averages: given enough time these trajectories will visit 
each part of the table equally. And trajectories with rational slopes with large 
denominators will have time averages close to 0. 

Because the rational numbers have measure 0, their contribution to the av- 
erage does not matter; in this case, at least, Boltzmann’s ergodic hypothesis 
seems correct. A 

Integrability of “almost” continuous functions 

We are now ready to prove Theorem 4.4.4: 

Theorem 4.4.4. A function / : R n — ► R, bounded and with bounded sup- 
port, is integrable if and only if it is continuous except on a set of measure 
0 . 

Proof. Since this is an “if and only if” statement, we must prove both direc- 
tions. We will start with the harder one: if a function / : — ► M, bounded 

and with bounded support, is continuous except on a set of measure 0, then it 
is integrable. We will use the criterion for integrability given by Theorem 4.3.1; 
thus we want to prove that for all e > 0 there exists N such that the cubes 
C € T>s over which osc (/) > e have a combined volume less than e. 

We will denote by A the set of points where / is not continuous, and we will 
choose some e > 0 (which will remain fixed for the duration of the proof). By 
Definition 4.4.1 of measure 0, there exists a sequence of boxes B t such that 

A € U Bi and ^vol n Bi < e, 4.4.4 

and no box is contained in any other. 

The proof is fairly involved. First, we want to get rid of infinity. 

Lemma 4.4.5. There are only finitely many boxes B t on which osc^ (/) > e. 

We will denote such boxes B ^ , and denote by L the union of the Bi i . 

Proof of Lemma 4.4.5. We will prove Lemma 4.4.5 by contradiction. Assume 
it is false. Then there exist an infinite subsequence of boxes Bi } , and two infinite 
sequences of points, Xj,yj € B l} , such that \f(xj) - f{yj)\ > (. 

The sequence x, is bounded, since the support of / is bounded and Xj is 
in the support of /. So (by Theorem 1.6.2) it has a convergent subsequence 
x jk converging to some point p. Since |/(xj) - f{yj)\ — 0 as j -»• oo, the 
subsequence y j k also converges to p. 
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The collection of boxes cover- 
ing A is lightly shaded; those with 
osc > e are shaded slightly darker. 
A convergent subsequence of those 
is shaded darker yet: the point p 
to which they converge must be- 
long to some box. 


You may ask, how do we know 
they converge to the same point? 
Because Xtf jt yjv 4 , and z^ are all 
in the same cube, which is shrink- 
ing to a point as N — * oo. 


The boxes B, making up L are 
in R n , so the complement of L is 
~ L. Recall (Definition 1.5.4) 
that a closed set C C R n is a set 
whose complement M n — C isopen. 


The point p has to be in a particular box, which we will call B p . (Since the 
boxes can overlap, it could be in more than one, but we just need one.) Since 
Xj k and yj k converge to p, and since the get small as j gets big (their total 
volume being less than c), then all B ljfc after a certain point will be contained 
in B p . But this contradicts our assumption that we had pruned our list of 
Bi so that no one box was contained in any other. Therefore Lemma 4.4.5 is 
correct: there are only finitely many Bi on which osc b { / > c. (Our indices 
have proliferated in an unpleasant fashion. As illustrated in Figure 4.4.3, B t - 
are the sequences of boxes that cover A, i.e., the set of discontinuities; B ij 
are those B t 's where osc > e; and B i}k are those Bi j ’s that form a convergent 
subsequence.) 

Now we assert that if we use dyadic pavings to pave the support of our 
function /, then: 

Lemma 4.4.6. There exists N such that if C 6 pAr(R n ) and osc c f > €, then 
CcL. 

That is, we assert that / can have osc > c only over C's that are in L. 
If we prove this, we will be finished, because by Theorem 4.3.1, a bounded 
function with bounded support is integrable if there exists an TV at which the 
total volume of cubes with osc > c is less than e. We know that L is a finite 
set of Bj, and (Equation 4.4.4) that the B< have total volume < e. 

To prove Lemma 4.4.6, we will again argue by contradiction. Suppose the 
lemma is false. Then for every N, there exists a C N not a subset of L such that 
osc Cn f > c. In other words, 

3 points x N ,y N , z N in C N , with z N $ L, and lf(x N ) - /(yn) I > e. 4.4.5 

Since x N ,y N , and z N are infinite sequences (for N = 1,2,...), then there 
exist convergent subsequences x^ i ,y^. and zn v all converging to the same 
point, which we will call q. 

What do we know about q? 

• q € A: i.e., it is a discontinuity of /. (No matter how close x N . and y N . 
get to q, |/(x;v.) - f(y N .) | > e.) Therefore (since all the discontinuities of the 
function are contained in the B*.), it is in some box B*, which we’ll call B q . 

• q & L. (The set L is open, so its complement is closed; since no point of 
the sequence z N . is in L, its limit, q, is not in L either.) 

Since q € B q , and q £ L, we know that B q is not one of the boxes with 
osc > e. But that isn’t true, because x N{ and y N are in B q for large 
enough, so that osc B q f < € contradicts |/(x A r i ) - /(vn ) I > €. 

Therefore, we have proved Lemma 4.4.6, ’which, as we mentioned above, 
means that we have proved Theorem 4.4.4 in one direction: if a bounded func- 
tion with bounded support is continuous except on a set of measure 0, then it 
is integrable. 
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Figure 4.4.4. 


The function that is identically 
1 on the indicated dyadic cube and 
0 elsewhere is discontinuous on 
the boundary of the dyadic cube. 
For instance, the function is 0 on 
one of the indicated sequences of 
points, but its value at the limit 
is 1. This point is in the interior 
of the shaded cube of the dotted 
grid. 

Definition 4.4.1 specifies that 
the boxes Bi are open. Equa- 
tion 4.1.13 defining dyadic cubes 
shows that they are half-open: x, 
is greater than or equal to one 
amount, but strictly less than an- 
other: 

ki fc, 4- 1 

1 k f ^ X| S — TT7 — • 

2 at ” ’ * 2 n 


Now we need to prove the other direction: if a function / : K n — ► K, bounded 
and with bounded support, is integrable, then it is continuous except on a set of 
measure 0. This is easier, but the fact that we chose our dyadic cubes half-open, 
and our boxes open, introduces a little complication. 

Since / is integrable, we know (Theorem 4.3.1) that for any c > 0, there 
exists N such that the finite union of cubes 

{C e | osc c(f) > e } 4 - 4 - 6 

has total volume less than e. 

Apply Equation 4.4.6, setting e\ = <5/4, with <5 > 0. Let be the finite 
collection of cubes C € Z>jVj(R n ) with oscc / > <5/4. These cubes have total 
volume less than <5/4. Now we set e 2 = <5/8, and let Cn 2 be the finite collection 
of cubes C 6 ZV 2 (IR n ) with oscc / > <5/8; these cubes have total volume less 
than <5/8. Continue with c 3 = <5/16, .... 

Finally, consider the infinite sequence of open boxes £|, £ 2 , • • • obtained by 
listing first the interiors of the elements of Cn % , then those of the elements of 
Cn 2 , etc. 

This almost solves our problem: the total volume of our sequence of boxes 
is at most <5/4 + <5/8 + • • • = <5/2. The problem is that discontinuities on the 
boundary of dyadic cubes may go undetected by oscillation on dyadic cubes: 
as shown in Figure 4.4.4, the value of the function over one cube could be 0, 
and the value over an adjacent cube could be 1; in each case the oscillation over 
the cube would be 0, but the function would be discontinuous at points on the 
border between the two cubes. 

To deal with this, we simply shift our cubes by an irrational amount, as 
shown to the right of Figure 4.4.4, and repeat the above process. 

To do this, we set 

rv'r 

/(x) — /(x - a), where a = 4.4.7 

.y/2. 

(We could translate x by any number with irrational entries, or indeed by a 
rational like 1/3.) Repeat the argument, to find a sequence Bj, B 2 , 

Now translate these back: set 

^ = (x — a | x € B, } . 4.4.8 

Now we claim that the sequence Bi,B' 1 ,B 2 , B£, . . . solves our problem. We 
have 

vol„(B;) + vol n (B£) + ... < ^ + J + = i 4.4.9 

4 o 2 

so the total of volume of the sequence B \ , B \ , B 2 , B 2 , . . . is less than <5. 

Now we need to show that / is continuous on the complement of 

B\ U B[ U B 2 U B 2 , U . . . , 


4.4.10 
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i.e., on M n minus the union of the B{ and B[. Indeed, if x is a point where / 
is not continuous, then at least one of x and x = x -f a is in the interior of a 
dyadic cube. 

Suppose that the first is the case. Then there exists TV* and a sequence x, 
converging to x so that |/(xj) - /(x)| > 8/2 l+Nk for all i\ in particular, that 
cube will be in the set and x will be in one of the B{. 

If x is not in the interior of a dyadic cube, then x is a point of discontinuity 
of /, and the same argument applies. □ 


4.5 Fubini’s Theorem and Iterated Integrals 


The expression on the left-hand 
side of Equation 4.5.1 doesn’t 
specify the order in which the vari- 
ables are taken, so the iterated in- 
tegral on the right could be writ- 
ten in any order: we could inte- 
grate first with respect to x n , or 
any other variable, rather than x \ . 
This is important for both theo- 
retical and computational uses of 
Fubini’s theorem. 


We now know — in principle, at least — how to determine whether a function is 
integrable. Assuming it is, how do we go about integrating it? Fubini’s theorem 
allows us to compute multiple integrals by hand, or at least reduce them to the 
computation of one-dimensional integrals. It asserts that if / : M n R is 
integrable, then 



That is, first we hold the variables X 2 . . . x n constant and integrate with 
respect to xw then we integrate the resulting (no doubt complicated) function 
with respect to X 2 , and so on. 


Remark. The above statement is not quite correct, because some of the func- 
tions in parentheses on the right-hand side of Equation 4.5.1 may not be inte- 
grable; this problem is discussed (Example A13.1) in Appendix. A13. We state 
Fubini’s theorem correctly at the end of this section. For now, just assume that 
we are in the (common) situation where the above statement works. A 

In practice, the main difficulty in setting up a multiple integral as an iterated 
one-dimensional integral is dealing with the “boundary” of the region over which 
we wish to integrate the function. We tried to sweep difficulties like the fractal 
coastline of Britain under the rug by choosing to integrate over all of R n , but of 
course those difficulties are still there. This is where we have to come to terms 
with them: we have to figure out the upper and lower limits of the integrals. 

If the domain of integration looks like the coastline of Britain, it is not 
at all obvious how to go about this. For domains of integration bounded by 
smooth curves and surfaces, formulas exist in many cases that are of interest 

(particularly during calculus exams), but this is still the part that gives students 
the most trouble. 

Before computing any multiple integrals, let’s see how to set them up. While 
a multiple integral is computed from inside out— first with respect to the vari- 
able in the inner parentheses— we recommend setting up the problem from 
outside in, as shown in Examples 4.5.1 and 4.5.2. 
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By “integrate over the triangle” 
we mean that we imagine that the 
function / is defined by some for- 
mula inside the triangle, and out- 
side the triangle / = 0. 



Figure 4.5.1. 

The triangle defined by Equa- 
tion 4.5.2. 


Example 4.5.1 (Setting up multiple integrals: an easy example). Sup- 
pose we want to integrate a function / ( y ) over triangle 


r= Uy) €R 2 I 0 - 21 - 2 '- 2 } 4 ' 52 

shown in Figure 4.5.1. This triangle is the intersection of the three regions (in 
this case, half-planes) defined by the three inequalities 0 < x, 2 x < y, and y < 
2 . 

Say we want to integrate first with respect to y. We set up the integral as 
follows, temporarily omitting the limits of integration: 

J j f{y) dxdy = J (/ f dyS jdx- 4.5.3 

R 2 

(We just write / for the function, as we don’t want to complicate issues by 
specifying a particular function.) Starting with the outer integral — thinking 
first about x — we hold a pencil parallel to the y-axis and roll it over the triangle 
from left to right. We see that the triangle (the domain of integration) starts 
at x = 0 and ends at x = 1, so we write in those limits: 


/ dyj dx. 4.5.4 

Once more we roll the pencil from x = 0 to x = 1, this time asking ourselves 
what are the upper and lower values of y for each value of x? The upper value 
is always y = 2. The lower value is given by the intersection of the pencil with 
the hypotenuse of the triangle, which lies on the line y = 2x. Therefore the 
lower value is y — 2x, and we have 



fXfj*)*'- 

If we want to start by integrating / with respect to x, we write 


4.5.5 


JJ f (y) <fad S' = /(// <fa W 4.5.6 

R a ' 

and, again starting with the outer integral, we hold our pencil parallel to the 
x-axis and roll it from the bottom of the triangle to the top, from y = 0 to 
y = 2. As we roll the pencil, we ask what are the lower and upper values of x 
for each value of y. The lower value is always x = 0, and the upper value is set 
by the hypotenuse, but we express it now in terms of x, getting x = y/2. This 
gives us 



dy. 


4.5.7 
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a I 


Figure 4.5.2. 

The shaded area represents a 
truncated part of the triangle of 
Figure 4.5.1 



Figure 4.5.3. 

The region of integration for 
Example 4.5.2. 


Now suppose we are integrating over only part of the triangle, as shown in 
Figure 4.5.2. What limits do we put in the expression f(f fdy)dx? Try it 
yourself before checking the answer in the footnote.* A 


Example 4.5.2 (Setting up multiple integrals: a somewhat harder 
example). Now let’s integrate an unspecified function / (y) over t ^ ie area 

bordered on the top by the parabolas y = x 2 and y = (x - 2) 2 and on the 
bottom by the straight lines y = -x and y — x- 2, as shown in Figure 4.5.3. 

Let’s start again by sweeping our pencil from left to right, which corresponds 
to the outer integral being with respect to x. The limits for the outer integral 
are clearly x — 0 and x = 2, giving 



4.5.8 


As we sweep our pencil from left to right, we see that the lower limit for y 
is set by the straight line y = —x, and the upper limit by the parabola y = x 2 , 
so we are tempted to write 



4.5.9 


But once our pencil arrives at x = 1, we have a problem. The lower limit 
is now set by the straight line y = x - 2, and the upper limit by the parabola 
y — (x - 2) 2 . How can we express this? Try it yourself before looking at the 
answer in the footnote below. 5 A 


Exercise 4.5.2 asks you to set up the multiple integral for Example 4.5.2 when 
the outer integral is with respect to y. Exercise 4.5.3 asks you to set up the 
multiple integral /(/ / dx) dy for the truncated triangle shown in Figure 4.5.2. 
In both cases the answer will be a sum of integrals. 


Example 4.5.3 (A multiple integral in R 3 ). As you might imagine, already 
in IR 3 this kind of visualization becomes much harder. Here is an unrealistically 

4 When the domain of integration is the truncated triangle in Figure 4.5.2, the 
integral is written 

lo(O dy ) dX - 

In the other direction writing the integral is harder; we will return to it in Exercise 
4.5.3. 

5 We need to break up this integral into a sum of integrals: 

Z'(£ /dv ) <b+ / 2 (/.-V ) fdy ) 

Exercise 4.5.1 asks you to justify our ignoring that we have counted the line x = 1 
twice. 
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z 



Figure 4.5.4. 

Tbp: The pyramid over which we 
are integrating in Example 4.5.3. 
Middle: The same pyramid, trun- 
cated at height z. Bottom: The 
plane at height z shown in the 
middle figure, put flat. 


simple example. Suppose we want to integrate a function over the pyramid P 
shown in the top of Figure 4.5.4, and given by the formula 



Gl 3 |0<i;0<j/;0<z;i + y + 2< 1 1 


4.5.10 


We want to figure out the limits of integration for the multiple integral 


UK ) dx dydz = J ^ J {^J f dx^j dy'j dz. 

x i z i / 


4.5.11 


There are six ways of applying Fubini’s theorem, which in this case because of 
the symmetries will result in the same expressions with the variables permuted. 

Let us think of varying z first, for instance by lifting a piece of paper and 
seeing how it intersects the pyramid at various heights. Clearly the paper will 
only intersect the pyramid when its height is between 0 and 1. This leads to 
writing 


l 

( )dz 4.5.12 

where the space needs be filled in by the double integral of / over the part of 
the pyramid P at height z , pictured in the middle of Figure 4.5.4, and again at 
the bottom, this time drawn flat. 

This time we are integrating over a triangle (which depends on z), just as in 
Example 4.5.1. Let us think of varying y next (it could just as well have been 
x), (rolling a horizontal pencil up); clearly the relevant y - values are between 0 
and 1 — z, which leads us to write 





4.5.13 


where now the space represents the integral over part of the horizontal line 
segment at height z and “depth” y (if depth is the name of the y coordinate). 
These x- values are those between 0 and 1 — z — y, so finally the integral is 



4.5.14 


Now let’s actually compute a few multiple integrals. 

Example 4.5.4 (Computing a multiple integral). Suppose we have a 

function / ( ^ ) = xy defined on the unit square, as shown in Figure 4.5.5. 
Then 
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In Equation 4.5.15, recall that 
to compute 



we evaluate x 2 y/ 2 at both x = 1 
and x = 0, subtracting the second 
from the first. 



Figure 4.5.5. 

The integral in Equation 4.5.15 
is 1/4: the volume under the sur- 
face defined by / ^ ^ = xy, and 
above the unit square, is 1/4. 



Figure 4.5.6. 


The triangle of Example 4.5.5. 


IJ f (y) dxdy = JXl xvdx ) dy 

S 3 




A 


4.5.15 


In Example 4.5.4 it is clear that we could have taken the integral in the 
opposite order and found the same result, since our function / ( y ) = X V> 
and xy — yx. Fubini’s theorem says that this is always true as long as the 
functions involved are integrable. This fact can be useful; sometimes a multiple 
integral can be computed in elementary terms when written in one direction, 
but not in the other, as you will see in Example 4.5.5. It may also be easier 
to determine the limits of integration if the problem is set up in one direction 
rather than another, as we already saw in the case of the truncated triangle 
shown in Figure 4.5.2. 


Example 4.5.5 (Choose the easy direction). Let us integrate the function 
e“ v over the triangle shown in Figure 4.5.6: 

T={(*)6M 2 |°<i<y< 1 }. 4.5.16 

Fubini’s theorem gives us two ways of writing this integral as an iterated one- 
dimensional integral: 

(!) (^j e'^dy^jdx and (2) an e~ vl dx\ dy. 4.5.17 

The first cannot be computed in elementary terms, since e~ y2 does not have 
an elementary anti-derivative. 

But the second can: 

L{£ e ' VldX ) dy = L ye ~ V,dy = - 1 2 Mo = K 1 _ e)- A 4518 

Older textbooks contain many examples of this sort of computational mira- 
cle. We are not sure the phenomenon was ever very important, but today it is 
sounder to take a serious interest in the numerical theory, and go lightly over 
computational tricks, which do not work in any great generality in any case. 


Example 4.5.6 (Volume of a ball in R n ). Let £J( 0) be the ball of radius 
R in ]R n , centered at 0, and let b n (R) be its volume. Clearly b n (R) = R n b n ( 1). 
We will denote 6„(1) = /?„ the volume of the unit ball. 
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You should learn how to han- 
dle simple examples using Pubini’s 
theorem, and you should learn 
some of the standard tricks that 
work in some more complicated 
situations; these will be handy on 
exams, particularly in physics and 
engineering classes. But in real life 
you are likely to come across nas- 
tier problems, which even a profes- 
sional mathematician would have 
trouble solving “by hand”; most 
often you will want to use a com- 
puter to compute integrals for you. 
We discuss numerical methods of 
computing integrals in Section 4.6. 


By Fubini’s theorem, 


(n - 1 )-dimensional vol. 
of one slice of R" (0) 

-■ -»■ 


0n, 

vol- of 
unit ball in 


= / |<Tx| = f (j I d 

JB?{0) J - 1 I Ja-zL—m 


n— 1 


dx 


/ I /*1 

6„_ i (yT-4) dx ’> = I (i - *1)^ £n-i. dx " 

^ r"-l vol. ball of 


= 3, 


vol. ball of radius 
in ®"- 1 

J (1 -xl^dXn. 


vol. ball of 
radius 1 
in 


4.5.19 


vol. of unit 
ball in S n_1 


This reduces the computation of 6 n to computing the integral 



4.5.20 


This is a standard tricky problem from one-variable calculus: Exercise 4.5.4, 
(a) asks you to show that 

n — 1 _ . , K 

c n = c n _ 2 , for n > 2. 4.5.21 

n 

So if we can compute Co and Ci, we can compute all the other c n . Exercise 
4.5.4, (b) asks you to show that cq = tt and Ci = 2 (the second is pretty easy). 


The ball B"( 0) is the ball of 
radius 1 in R'\ centered at the 
origin; 



( 0 ) 


is the ball of radius y/\ — x\ in 
R*' 1 , still centered at the origin. 


In the first line of Equation 
4.5.19 we imagine slicing the n- 
dimensional ball horizontally and 
computing the n — l)-dimensional 
volume of each slice. 


n 

Cn = I V L ^n-2 

Volume of ball 

3 n ~~ Cn 3 n — 1 

0 

7T 


1 

2 

2 

2 

K 

2 

TT 

3 

4 

4n 


3 

3 

4 

3k 

TT 2 


S 

2 

5 

16 

Sir 2 


15 

15 


FIGURE 4.5.7. Computing the volume of a ball in 1R 1 through 3*. 5 . 
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Choosing a random parallelo- 
gram: one dart lands at 

the other at ^ ^ ^ 

Remember that in K 2 , the de- 
terminant is 



det 


X\ 

l/i 


X2 

1 / 2 . 


= x\y2 - X2yi- 


We have d 2 x and d 2 y because 
x and y have two coordinates: 

d 2 x = dx\dx2 and d 2 y — dy\dy2- 


Saying that we are choosing our 
points at random in the square 
means that our density of proba- 
bility for each dart is the charac- 
teristic function of the square. 


An integrand is what comes af- 
ter an integral sign: for J xdx, 
the integrand is x dx. In Equation 
4.5.24, the integrand for the inner- 
most integral is \x1y2 -X2Vi\dy2\ 
the integrand for the integral im- 
mediately to the left of the inner- 
most integral is 



This allows us to make the table of Figure 4.5.7. It is easy to continue the 
table (what is tf 6 ? Check below. 6 ) If you enjoy inductive proofs, you might try 
Exercise 4.5.5, which asks you to show that 



-k U o2*+l 

“ d ^ = W 


A 


4.5.22 


Computing probabilities using integrals 

As we mentioned in Section 4.1, an important use of integrals is in computing 
probabilities. 


Example 4.5.7 (Using Fubini to compute a probability). Choose at 
random two pairs of positive numbers between 0 and 1 and use those numbers 
as the coordinates (£ 1 , 2 / 1 ), (£ 2 , 2 / 2 ) of two vectors anchored at the origin, as 
shown in Figure 4.5.8. (You might imagine throwing a dart at the unit square.) 
What is the expected (average) area of the parallelogram spanned by those 
vectors? In other words, what is the expected value of the absolute value of the 
determinant? 

This average is 

J 1 *13/2 -i/i^ ll^xHcPyl, 4.5.23 

det 


where C is the unit cube in M 4 . (Each possible parallelogram corresponds 
to two points in the unit square, each with two coordinates, so each point in 
C € R 4 corresponds to one parallelogram.) Our computation will be simpler if 
we consider only the cases £1 > 2 / 1 ; i.e., we assume that our first dart lands below 
the diagonal of the square. Since the diagonal divides the square symmetrically, 
the cases where the first dart lands below the diagonal and the cases where it 
lands above contribute the same amount to the integral. Thus we want to 
compute twice the quadruple integral 


n f [ \xiy2-X2Vi\dy2dx2dyidX] 
Jo Jo 


4.5.24 


(Note that the integral /*' goes with dy Y : the innermost integral goes with the 
innermost integrand, and so on. The second integral is f* 1 because 2/1 < £ 1 .) 

Now we would like to get rid of the absolute values, by considering separately 
the case where det = £ijf 2 * 22/1 I s negative, and the case where it is positive. 
Observe that when 2/2 < 3 /i* 2 /£i, the determinant is negative, whereas when 
2/2 > y\X 2 fx\ it is positive. Another way to say this is that on one side of the 


6 Ce = §C4 = so fc =: cefa = 
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| +det j ,• 



Figure 4.5.9. 

The arrow represents the first 
dart. If the second dart (with 
coordinates 22 , 1 / 2 ) lands in the 
shaded area, the determinant will 
be negative. Otherwise it will be 
positive. 



line y 2 = Jj -22 (the shaded side in Figure 4.5.9) the determinant is negative, 
and on the other side it is positive. 

Since we have assumed that the first dart lands below the diagonal of the 
square, then whatever the value of x 2 , when we integrate with respect to y 2 , 
we will have two choices: if y 2 is in the shaded part, the determinant will be 
negative; otherwise it will be positive. So we break up the innermost integral 
into two parts: 


- det 4- det 



(If we had not restricted the first dart to below the diagonal, we would have 
the situation of Figure 4.5.10, and our integral would be a bit more compli- 
cated. 7 ) 

The rest of the computation is a matter of carefully computing four ordinary 
integrals, keeping straight what is constant and what is the variable of integra- 
tion at each step. First we compute the inner integral, with respect to y 2 . The 
first term gives 


yV 

x 2 y\y 2 -x x -± 


VlXa/Xj 

0 


eval. at 
V2— VlX 2 /*l 



eval. at ya=0 



1 8/1*2 

2 X\ 


The second gives 


If we had not restricted the first 
dart to below the diagonal, then 
for values of x 2 to the left of the 
vertical dotted line, the sign of the 
determinant would depend on the 
value of j /2 - For values of x 2 to the 
right of the vertical dotted line, 
the determinant would be negative 
for all values of y 2 . 


eval. at ya=l 


x x y\ 


eval. at ya—y 1 x 2/21 


-x 2 y x y 2 


~ f Xi ~ ( siyfe* mi*2\ 

■ ».*,/*. ' 2 2 V V 2x? x, ) 


x x xh/i 

= T~ X ™ + S 


Continuing with Equation 4.5.25, we get 


7 In that case we would write 


4.5.26 



(yi* 2 )/*i 

(2 2 yi -21P2) djft 


+ f {x\y 2 -x 2 y x )dy 2 \dx 2 

Vl*2)/*1 / 


+ 



{x 2 y 1 - x x y 2 )dy 2 



dyidx x . 


The first integral with respect to x 2 corresponds to values of x 2 to the left of the 
vertical dotted line in Figure 4.5.10; the second corresponds to values of x 2 to the 
right of that line. Exercise 4.5.7 asks you to compute the integral this way. 
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What if we choose at random 
three vectors in the unit cube? 
Then we would be integrating over 
a nine-dimensional cube. Daunted 
by such an integral, we might be 
tempted to use a Riemann sum. 
But even if we used a rather coarse 
decomposition the computation is 
forbidding. Say we divide each 
side into 10, choosing (for exam- 
ple) the midpoint of each mini- 
cube. We turn the nine coordi- 
nates of that point into a 3 x 3 
matrix and compute its determi- 
nant. That gives 10 9 determi- 
nants to compute — a billion deter- 
minants, each requiring 18 multi- 
plications and five additions. 

Go up one more dimension and 
the computation is really out of 
hand. Yet physicists like Nobel 
laureate Kenneth Wilson routinely 
work with integrals in dimensions 
of thousands or more. Actually 
carrying out the computations is 
clearly impossible. The technique 
most often used is a sophisticated 
version of throwing dice, known as 
Monte Carlo integration. It is dis- 
cussed in Section 4.6. 


• 1 /•xi/*l/ f(y ixa)/xi 

(x 2 yi -XiV 2 )dy 2 + 


f 


(X\y2-X2y\)dy2 \ dx2dy\dx\, 


(yxi2)/xi 


/ 


- j: c & =i: a 


*2 = 0 


[ 3 

J H=0 


36 J 0 1 1 36 


r^3i 
T i 


i 


Jo 


13 

108 


4.5.27 


So the expected area is twice 13/108, i.e., 13/54, or slightly less than 1/4. 


Stating Fubini’s theorem more precisely 

We will now give a precise statement of Fubini’s theorem. The statement is 
not as strong as what we prove in Appendix A. 13, but it keeps the statement 
simpler. 


Theorem 4.5.8 (Fubini’s theorem). Let f be an integr&ble function on 
M n x M m , and suppose that for each x € M n , the function y »-» /(x,y) is 
integrable. Then the function 


is integrable, and 


xh / /(x,y)|d m y| 

Jr™ 


( /(x.y)M n *IM m yl= f (f /(x,y)M m y|) 

jRn+m J^n \y**» / 


4.6 Numerical Methods of Integration 

In a great many cases, Fubini’s theorem does not lead to expressions that cap 
be calculated in closed form, and integrals must be computed numerically. 
In one dimension, this subject has been extensively investigated, and there is 
an enormous literature on the subject. In higher dimensions, the literature is 
still extensive but the field is not nearly so well known. We will begin with a 
reminder about the one-dimensional case. 
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Our late colleague Milton 
Abramowitz used to say —some- 
what in jest— that 95 percent of all 
practical work in numerical analy- 
sts boiled down to applications of 
Simpson's rule and linear interpo- 
lation. 

-Philip J. Davis and Philip Ra- 
binowitz, Methods of Numerical 
Integration , p. 45 


Here in speaking of weights 
starting and ending with 1, and so 
forth, we are omitting the factor 
of (b - «)/6n. 

Why do we multiply by (6 — 
o)/6n? Think of the integral as 
the sum of the area of n “rectan- 
gles,” each with width ( b-a)/n — 
i.e., the total length divided by the 
number of "rectangles”. Multiply- 
ing by (6 - a)/n gives the width 
of each rectangle. The height of 
each “rectangle” should be some 
sort of average of the value of the 
function over the interval, but in 
fact we have weighted that value 
by 6. Dividing by 6 corrects for 
that weight. 


One-dimensional integrals 

In first year calculus you probably heard of the trapezoidal rule and of Simpson’s 
rule for computing ordinary integrals (and quite likely you’ve forgotten them 
too). The trapezoidal rule is not of much practical interest, but Simpson’s 
rule is probably good enough for anything you will need unless you become an 
engineer or physicist. In it, the function is sampled at regular intervals and 
different “weights” are assigned the samples. 

Definition 4.6.1 (Simpson’s rule). Let / be a function on [a, 6], choose an 
integer n, and sample / at 2n 4- 1 equally distributed points, x 0 ,Xi, . . . ,£ 2 n» 
where xq — a and X 2 n = 6. Then Simpson’s approximation to 

f(x) dx in n steps is 





b - a 
6n 


(/(^o)+4/(Xi)+2/(x 2 )+4/(x 3 )d- •*+4/(X2n-l)+/(X2„)). 


For example, if n ~ 3, a = -1 and 6=1, then we divide the interval [—1,1] 
into six equal parts and compute 


1 


i (/( - 1 ) + 4/(- 2/3) + 2/( - 1/3) + 4/(0) + 2/( 1/3) + 4/(2/3) + /( 1)) . 4.6. 

Why do the weights start and end with 1, and alternate between 4 and 2 for 
the intermediate samples? As shown in Figure 4.6.1, the pattern of weights is 
not 1, 4, 2, . . . , 4. 1 but 1, 4, 1: each 1 that is not an endpoint is counted twice, 
so it becomes the number 2. We are actually breaking up the interval into n 
subintervals, and integrating the function over each subpiece 


I f(x) dx= f f(x) dx+ [ f(x) dx + . . . . 
Jn Ja Jx 2 


4.6.2 


Each of these n sub-integrals is computed by sampling the function at the 
beginning point and endpoint of the subpiece (with weight 1) and at the center 
of the subpiece (with weight 4), giving a total of 6. 


Theorem 4.6.2 (Simpson’s rule), (a) If f is a piecewise cubic function, 
exactly equal to a cubic polynomial on the intervals [x 2t , x 2t - f . 2 ], then Simp - 
son f s rule computes the integral exactly 

(b) If a /unction / is four times continuously differentiable, then there 
exists c € (a, 6) such that 

5| "'‘> (/) “ L f(x) dx = 4 - 6 - 3 



4.6 Numerical Methods of Integration 397 


2nd piece 



1 st piece nth piece 


Theorem 4.6.2 tells when Simp- 
son's rule computes the integral 
exactly, and when it gives an ap- 
proximation. 

Simpson's rule is a fourth-order 
method ; the error (if / is suffi- 
ciently differentiable) is of order 
h 4 , where h is the step size. 


FIGURE 4.6.1. To compute the integral of a function / over [a* 6], Simpson’s rule 
breaks the interval into n pieces. Within each piece, the function is evaluated at 
the beginning, the midpoint, and the endpoint, with weight 1 for the beginning and 
endpoint, and weight 4 for the midpoint. The endpoint of one interval is the beginning 
point of the next, so it is counted twice and gets weight 2. At the end, the result is 
multiplied by (6 — a)/6n. 

Proof. Figure 4.6.2 proves part (a); in it, we compute the integral for constant, 
linear, quadratic, and cubic functions, over the interval [—1,1], with n = 1. 
Simpson’s rule gives the same result as computing the integral directly. 


By cubic polynomial we mean 
polynomials of degree up to and 
including 3: constant functions, 
linear functions, quadratic polyno- 
mials and cubic polynomials. 

If we can split the domain of 
integration into smaller intervals 
such that a function / is exactly 
equivalent to a cubic polynomial 
over each interval, then Simpson's 
rule will compute the integral of / 
exactly. 



Simpson’s rule 

Integration 

Function 

l/3(/(-l) + 4(/(0)) + /(l)) 

fl. j f(x) dx 

f(x) = 1 

1/3(1 + 4 + 1) = 2 

2 

H 

II 

2 

0 

0 

f(x) — X 2 

1/3(1 +0+1) = 2/3 

/!_ j x 2 dx — 2/3 

f(x) = X 3 

0 

0 


Figure 4.6.2. Using Simpson’s rule to integrate a cubic function gives the exact 
answer. 


A proof of part (b) is sketched in Exercise 4.6.8. 

Of course, you don’t often encounter in real life a piecewise cubic polynomial 
(the exception being computer graphics). Usually, Simpson’s method is used 
to approximate integrals, not to compute them exactly. 

Example 4.6.3 (Approximating integrals with Simpson’s rule). Use 

Simpson’s rule with n = 100 to compute 

/ 4 1 

/ -dx = log4 = 21og2, 

J i * 


4.6.4 
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In the world of computer 
graphics, piecewise cubic polyno- 
mials are everywhere. When you 
construct a smooth curve using a 
drawing program, what the com- 
puter is actually making is a piece- 
wise cubic curve, usually using the 
Bezier algorithm for cubic interpo- 
lation. The curves drawn this way 
are known as cubic splines. 

When first drawing curves with 
such a program it comes as a sur- 
prise how few control points are 
needed. 


By “normalize” we mean that 
we choose a definite domain of in- 
tegration rather than an interval 
[a, 6]; the domain [-1,1] allows us 
to take advantage of even and odd 
properties. 


which is infinitely differentiable. Since / < 4 > = 24/x 5 , which is largest at x = 1, 
Theorem 4.6.2 asserts that the result will be correct to within 

24 = 2.025 • 10~ s 4.6.5 

2880 • 100 4 

so at least seven decimals will be correct. A 

The integral of Example 4.6.3 can be approximated to the same precision 
with far fewer evaluations, using Gaussian rules. 


Gaussian rules 


Simpson’s rule integrates cubic polynomials exactly. Gaussian rules are de- 
signed to integrate higher degree polynomials exactly with the smallest number 
of function evaluations possible. Let us normalize the problem as follows, 
integrating from -1 to 1: 

Find points x \ , . . . , x m and weights W \ , . . . , w m with m as small as possible 
so that, for all polynomials p of degree < d , 


/ i m 

p(x) dx = ]TwiP(xi) 

1 *=i 


4.6.6 


We will require that the points x t satisfy -1 < x, < 1 and that Wi > 0 for all i. 

Think first of how many unknowns we have, and how many equations: the 
requirement of Equation 4.6.6 for each of the polynomials 1, x, . . . , x d give d+ I 
equations for the 2m unknowns Xi,. . .x m ,ti/i,. . . ,iu m i so we can reasonably 
hope that the equations might have a solution when 2m > d + 1. 


In Equation 4.6.7 we are inte 
grating f* x x n dx for 
3, using 


/ ,** * = {!*_ 


Example 4.6.4 (Gaussian rules). The simplest case (already interesting) is 
when d = 3 and m = 2. Showing that this integral is exact for polynomials of 
degree < 3 amounts to the four equations 


from 0 to 

far / = 1 

W\ 4- W2 — 2 


n is odd 

for f(x) = x 

Will + W 2 X 2 = 0 


n is even. 

for f(x) = x 2 

w\x\ -f W 2 X 2 — § 

4.6.7 


for f(x) = x 3 

u/ix 3 -1- W 2 X 2 = 0 



This is a system of four nonlinear equations in four unknowns, and it looks 
intractable, but in this case it is fairly easy to solve by hand: first, observe that 
if we set Xi =: -X 2 — x >0 and wi = W 2 ~ w, making the formula symmet- 
ric around the origin, then the second and fourth equations are automatically 
satisfied, and the other two become 

2w = 2 and 2wx 2 = 2/3, 
i.e., w = 1 and x = l/\/3. A 


4.6.8 
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Exercises 4.6.4 and 4.6.5 invite 
you to explore how Gaussian in- 
tegration can be adapted to such 
integrals. 


Remark. This means that whenever we have a piecewise cubic polynomial 
function, we can integrate it exactly by sampling it at two points per piece. 
For a piece corresponding to the interval [-1, 1), the samples should be taken 
at — 1/>/3 and l/\/3, with equal weights. Exercise 4.6.3 asks you to say where 
the samples should be taken for a piece corresponding to an arbitrary interval 
[a, 6]. A 

If m — 2k is even, we can do something similar, making the formula symmet- 
ric about the origin and considering only the integral from 0 to 1. This allows 
us to cut the number of variables in half; instead of 4k variables (2k w's and 
2k x’s), we have 2k variables. We then consider the system of 2k equations 


W\ + t£>2 H h Wk — 

W\x\ + W2x\ + h w k x\ = 


1 

1 

3 


4.6.9 


WiX* k 2 + W2x\ k 2 + 


i _4fc— 2 

+ W k x k - 


4 k- r 

If this system has a solution, then the corresponding integration rule gives 
the approximation 


We say that Newton’s method 
works “reasonably” well because 
you need to start with a fairly 
good initial guess in order for the 
procedure to converge; some ex- 
periments are suggested in Exer- 
cise 4.6.2. 


/•I k 

/ f(x)dx* V ^i(/(xi) + /(-x t )), 4.6.10 

J ' 1 i=-k 

and this formula will be exact for all polynomials of degree < 2k — 1. 

A lot is known about solving the system of Equation 4.6.9. The principal 
theorem states that there is a unique solution to the equations with 0 < x\ < 
•••<**< 1, and that then all the w t are positive. The main tool is the theory 
of orthogonal polynomials, which we don’t discuss in this volume. Another 
approach is to use Newton’s method, which works reasonably well for k < 6 (as 
far as we have looked). 

Gaussian rules are well adapted to problems where we need to integrate 
functions with a particular weight, such as 


f f(x)e~*dx or f -jM==dx. 4.6.11 

Jo J-i %/l - 1 2 

Exercises 4.6.4 and 4.6.5 explore how to choose the sampling points and the 
weights in such settings. 


Product rules 


Every one-dimensional integration rule has a higher-dimensional counterpart, 
called a product rule. If the rule in one dimension is 


/ b k 

f(x)dx *s^2wif(pi), 

»= i 


4.6.12 
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then the corresponding rule in n dimensions is 



4.6.13 


The following proposition shows why product rules are a useful way of adapt- 
ing one-dimensional integration rules to several variables. 


Proposition 4.6.5 (Product rules). If /j, . . . ,/„ are functions that are 
integrated exactly by an integration rule: 
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Figure 4.6.3. 



Weights for approximating the 
integral over a square, using the 
two-dimensional Simpson’s rule. 
Each weight is multiplied by 

(6 - a) 2 /(36n 2 ). 


/ fj(x)dx = ^Wifjixi) forj = l,...,n, 

J a 


then the product 


def 


/(x) = fl(x 1 )f 2 (x 2 ) • . . fn(x n ) 


is integrated exactly by the corresponding product rule over [a, b] n . 


Proof. This follows immediately from Proposition 4.1.12. Indeed, 



fi(x\)dx\ 


fn {%n ) dx n 





□ 


4.6.14 

4.6.15 


4.6.16 


Example 4.6.6 (Simpson’s rule in two dimensions). The two-dimen- 
sional form of Simpson’s rule will approximate the integral over a square, using 
the weights shown in Figure 4.6.3 (each multiplied by ( b - a) 2 / (36n 2 )) . 

In the very simple case where we divide the square into only four subsquares, 
and sample the function at each vertex, we have nine samples in all, as shown 

in Figure 4.6.4. If we do this with the square of side length 2 centered at 0, 
Equation 4.6.13 then becomes 
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Here b — a = 2 (since we are 
integrating from -1 to 1), and 
n = 1 (this square corresponds, 
in the one-dimensional case, to the 
first piece out of n pieces, as shown 
in Figure 4.6.1). So 

(^HD'4 

Each two-dimensional weight is 
the product of two of the one- 
dimensional Simpson weights, un, 
W 2 , and W 3 , where wi = W 2 = 1/3- 
1 = 1/3, and W 2 = 1/3 ■ -4 = 4/3. 


f /(x) l^xl —wiwi f ^_} ^ + w\W 2 f ( 0 

"“l/T 4/9 1/9 

+ + “^ 2 /( 0 ) + W*Wj/(?) 4.6.17 

4/9 16/9 4/9 

4- w^w^f ( _} ) -f W 3 W 2 / ( 0 ) 

1/9 4/9 1/9 

Theorem 4.6.2 and Proposition 4.6.5 tell us that this two-dimensional Simp- 
son’s method will integrate exactly the polynomials 

1, x, y, x 2 , xy , y 2 ,x 3 , x 2 y , xy 2 , y 3 , 4.6.18 

and many others (for instance, x 2 y 3 ), but not x 4 . They will also integrate 
functions which are piecewise polynomials of degree at most three on each of 
the unit squares, as in Figure 4.6.4. A 

Gaussian rules also lead to product rules for integrating functions in several 
variables, which will very effectively integrate polynomials in several variables 
of high degree. 


| 1/9 4/9 1/9 


f > ■ - A - - ‘ 

4/9 *i 6/9 4/9 

1 



Figure 4.6.4. 

If we divide a square into only 
four subsquares, Simpson’s me- 
thod in two dimensions gives the 
weights above. 


Problems with higher dimensional Riemann sums 

Both Simpson’s rule and Gaussian rules are versions of Riemann sums. There 
are at least two serious difficulties with Riemann sums in higher dimensions. 
One is that the fancier the method, the smoother the function to be integrated 
needs to be in order for the method to work according to specs. In one dimension 
this usually isn’t serious; if there are discontinuities, you break up the interval 
into several intervals at the points where the function has singularities. But 
in several dimensions, especially if you are trying to evaluate a volume by 
integrating a characteristic function, you will only be able to maneuver around 
the discontinuity if you already know the answer. For integrals of this sort, it 
isn’t clear that delicate, high-order methods like Gaussians with many points 
are better than plain midpoint Riemann sums. 

The other problem has to do with the magnitude of the computation. In one 
dimension, there is nothing unusual in using 100 or 1000 points for Simpson’s 
method or Gaussian rules, in order to gain the desired accuracy (which might be 
10 significant digits). As the dimension goes up, this sort of thing becomes first 
alarmingly expensive, and then utterly impossible. In dimension 4, a Simpson 
approximation using 100 points to a side involves 100000000 function evalu- 
ations, within reason for today’s computers if you axe willing to wait a while; 
with 1 000 points to a side it involves 10 12 function evaluations, which would 
tie up the biggest computers for several days. By the time you get to dimen- 
sion 9, this sort of thing becomes totally unreasonable unless you decrease your 



A random number generator 
can be used to construct a code: 
you can add a random sequence 
of bits to your message, bit by bit 
(with no carries, so that 1 4- 1 — 0); 
to decode, subtract it again. If 
your message (encoded as bits) is 
the first line below, and the sec- 
ond line is generated by a random 
number generator, then the sum 
of the two will appear random as 
well, and thus undecipherable: 

10 11 10 10 1111 01 01 
01 01 10 10 0000 11 01 
ii io oo oo mi lo oo 


The points in A referred to in 
Definition 4.6.7 will no doubt be 
chosen using some pseudo- random 
number generator, if this is bi- 
ased, the bias will affect both the 
expected value and the expected 
variance, so the entire scheme be- 
comes unreliable. On the other 
hand, off-the-shelf random num- 
ber generators come with the 
guarantee that if you can detect 
a bias, you can use that informa- 
tion to factor large numbers and, 
in particular, crack most commer- 
cial encoding schemes. This could 
be a quick way of getting rich (or 
landing in jail). 

The Monte Carlo program is 
found in Appendix B.2, and at the 
website given in the preface. 
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desired accuracy: 100 9 = 10 1S function evaluations would take more than a 
billion seconds (about 32 years) even on the very fastest computers, but 10 9 is 
within reason, and should give a couple of significant digits. 

When the dimension gets higher than 10. Simpson’s method and all similar 
methods become totally impossible, even if you are satisfied with one significant 
digit, just to give an order of magnitude. These situations call for the proba- 
bilistic methods described below. They very quickly give a couple of significant 
digits (with high probability: you are never sure), but we will see that it is next 
to impossible to get really good accuracy (say six significant digits). 

Monte Carlo methods 

Suppose that we want to to find the average of | det A\ for all n x n matrices with 
all entries chosen at random in the unit interval. We computed this integral 
in Example 4.5.7 when n = 2, and found 13/54. The thought of computing 
the integral exactly for 3 x 3 matrices is awe-inspiring. How about numerical 
integration? If we want to use Simpson’s rule, even with just 10 points on 
the side of the cube, we will need to evaluate 10 9 determinants, each a sum 
of six products of three numbers. This is not out of the question with today’s 
computers, but a pretty massive computation. Even then, we still will probably 
know only two significant digits, because the integrand isn’t differentiable. 

In this situation, there is a much better approach. Simply pick numbers at 
random in the nine-dimensional cube, evaluate the determinant of the 3x3 ma- 
trix that you make from these numbers, and take the average. A similar method 
will allow you to evaluate (with some precision) integrals even of domains of 
dimension 20, or 100, or perhaps more. 

The theorem that describes Monte Carlo methods is the central limit theorem 
from probability, stated (as Theorem 4.2.11) in Section 4.2 on probability. 

When trying to approximate ^/(xJl^xl, the individual experiment is to 
choose a point in A at random, and evaluate / there. This experiment has a 
certain expected value E , which is what we are trying to discover, and a certain 
standard deviation o. 

Unfortunately, both are unknown, but running the Monte Carlo algorithm 
gives you an approximation of both. It is wiser to compute both at once, as the 
approximation you get for the standard deviation gives an idea of how accurate 
the approximation to the expected value is. 

Definition 4.6.7 (Monte Carlo method). The Monte Carlo algorithm 
for computing integrals consists of 

(1) Choosing points x*,t =• 1, . . . , N in A at random, equidistributed in 
A. 

(2) Evaluating a* = /(x*) and hi = (/(x<)) 2 . 

(3) Computing a = -fa a, and a 2 = -£ a * - a 2 . 



4.6 Numerical Methods of Integration 403 


Probabilistic methods of inte- 
gration are like political polls. You 
don’t pay much (if anything) for 
going to higher dimensions, just as 
you don’t need to poll more people 
about a Presidential race than for 
a Senate race. 

The real difficulty with Monte 
Carlo methods is making a good 
random number generator, just as 
in polling the real problem is mak- 
ing sure your sample is not bi- 
ased. In the 1936 presidential 
election, the Literary Digest pre- 
dicted that Alf Landon would beat 
FYankliu D. Roosevelt, on the ba- 
sis of two million mock ballots re- 
turned from a mass mailing. The 
mailing list was composed of peo- 
ple who owned cars or telephones, 
which during the Depression was 
hardly a random sampling. 

Pollsters then began polling far 
fewer people (typically, about 10 
thousand), paying more attention 
to getting representative samples. 
Still, in 1948 the Tribune in Chica- 
go went to press with the head- 
line, “Dewey Defeats TYuman”; 
polls had unanimously predicted a 
crushing defeat for Truman. One 
problem was that some interview- 
ers avoided low-income neighbor- 
hoods. Another was calling the 
election too early: Gallup stopped 
polling two weeks before the elec- 
tion. 


Why l^xj in Equation 4.6.23? 
To each point x € R 9 , with coordi- 
nates xi , . . . , X9, we can associate 
the determinant of the 3x3 matrix 


Xi 

*4 

X? 

X 2 


x* 

.*3 

X 6 

Xg 


The number a is our approximation to the integral, and the number s is our 
approximation to the standard deviation a. 

The central limit theorem asserts that the probability that a is between 

E + ao/y/~N and E + be/'/N 4.6.19 


is approximately 



4.6.20 


In principle, everything can be derived from this formula: let us see how this 
allows us to see how many times the experiment needs to be repeated in order 
to know an integral with a certain precision and a certain confidence. 

For instance, suppose we want to compute an integral to within one part in a 
thousand. We can’t do that by Monte Carlo: we can never be sure of anything. 
But we can say that with probability 98%, the estimate a is correct to one part 
in a thousand, i.e., that 


E-a 

— < .001. 4.6.21 

This requires knowing something about the bell curve: with probability 98% 
the result is within 2.36 standard deviations of the mean. So to arrange our 
desired relative error, we need 


2.4a 

Vne 


< . 001 , 


i.e., 


AT > 


5.56 • 10 6 • a 2 
E 2 


4.6.22 


Example 4.6.8 (Monte Carlo). In Example 4.5.7 we computed the expected 
value for the determinant of a 2 x 2 matrix. Now let us run the program Monte 
Carlo to approximate 

f |detA||(ftc|, 4.6.23 

Jc 

i.e., to evaluate the average absolute value of the determinant of a 3 x 3 matrix 
with entries chosen at random in [0, 1J. 

Several runs of length 10000 (essentially instantaneous) 8 gave values of .127, 
.129, .129, .128 as values for s (guesses for the standard deviation a). For these 
same runs, the computer the following estimates of the integral: 

.13625, .133150, .135197, .13473. 4.6.24 

It seems safe to guess that a < .13, and also E & .13; this last guess is not 
as precise as we would like, neither do we have the confidence in it that is 

On a 1998 computer, a run of 5000000 repetitions of the experiment took about 
16 seconds. This involves about 3.5 billion arithmetic operations (additions, multipli- 
cations, divisions), about 3/4 of which are the calls to the random number generator. 
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Note that when estimating how 
many times we need to repeat an 
experiment, we don’t need several 
digits of a\ only the order of mag- 
nitude matters. 


required. Using these numbers to estimate how many times the experiment 
should be repeated so that with probability 98%, the result has a relative error 
at most .001, we use Equation 4.6.22 which says that we need about 5 000 000 
repetitions to achieve this precision and confidence. This time the computation 
is not instantaneous, and yields E = 0.134712, with probability 98% that the 
absolute error is at most 0.000130. This is good enough: surely the digits 134 
are right, but the fourth digit, 7, might be off by 1. A 


4.7 Other Pavings 


To measure the standard de- 
viation of the income of Ameri- 
cans, you would want to subdivide 
the U.S. by census tracts, not by 
closely spaced latitudes and lon- 
gitudes, because that is how the 
data is provided. 


The dyadic paving is the most rigid and restrictive we can think of, making 
most theorems easiest to prove. But in many settings the rigidity of the dyadic 
paving T>k is not necessary or best. Often we will want to have more “paving 
tiles” where the function varies rapidly, and bigger ones elsewhere, shaped to 
fit our domain of integration. In some situations, a particular paving is more 
or less imposed. 

Example 4.7.1 (Measuring rainfall). Imagine that you wish to measure 
rainfall in liters per square kilometer that fell over South America during Octo- 
ber, 1996. One possibility would be to use dyadic cubes (squares in this case), 
measuring the rainfall at the center of each cube and seeing what happens as 
the decomposition gets finer and finer. One problem with this approach, which 
we discuss in Chapter 5, is that the dyadic squares lie in a plane, and the surface 
of South America does not. 

Another problem is that using dyadic cubes would complicate the collection 
of data. In practice, you might break South America up into countries, and 
assign to each the product of its area and the rainfall that fell at a particular 
point in the country, perhaps its capital; you would then add these products 
together. To get a more accurate estimate of the integral you would use a finer 
decomposition, like provinces or counties. A 


Here we will show that very general pavings can be used to compute integrals. 

The set of all P € V com- 
pletely paves R", and two “tiles” Definition 4.7.2 (A paving of X C R n ). A paving of a subset X C R n is 

can overlap only in a set of volume a cation V of subsets P C X such that 

U PavP - X, and vol„(PjnP 2 ) = 0 (when Pi,P 2 6 V and Pi ^ P 2 ). 4.7.1 


Definition 4.7.3 (The boundary of a paving of X C R"). The bound- 
ary dP of V is the set of x G R n such that every neighborhood of x intersects 
at least two elements P € V. It includes of course the overlaps of pairs of 
tiles. 
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In contrast to the upper and 
lower sums of the dyadic decom- 
positions (Equation 4.1.18), where 
vol T , C is the same for any cube C 
at a given resolution N , in Equa- 
tion 4.7.3, vol n P is not necessar- 
ily the same for all “paving tiles” 
PeV.w. 

Recall that Mp(f) is the maxi- 
mum value of /(x) for x € P; sim- 
ilarly, mp(f) is the minimum. 


What we called Us if) in Sec- 
tion 4.1 would be called Uu N (f) 
using this notation. We will often 
omit the subscript T>n (which you 
will recall denotes the collection of 
cubes C at a single level N) when 
referring to the dyadic decomposi- 
tions, both to lighten the notation 
and to avoid confusion between V 
and V, which, set in small sub- 
script type, can look similar. 



If you think of the P G V as tiles, then the boundary dV is like the grout 
lines between the tiles— exceedingly thin grout lines, since we will usually be 
interested in pavings such that vol dV = 0. 

Definition 4.7.4 (Nested partition). A sequence V N of pavings of X C 
R n is called a nested partition of X if 

(1) Vn+i refines Vn‘- every piece of Vn+i is contained in a piece of Vn- 

(2) All the boundaries have volume 0: vol n (&PN) — 0 for every N . 

(3) The pieces of Vn shrink to points as N -*■ oo: 

lim sup diamP = 0. 4.7.2 

N-+oo P€ p N 


For example, paving the United States by counties refines the paving by 
states*, no county lies partly in one state and partly in another. A further 
refinement is provided by census tracts. (But this is not a nested partition, 
because the third requirement isn’t met.) 

We can define an upper sum Up N (f) and a lower sum Lp N (f) with respect 
to any paving: 

Uv N (f)= 51 M P (f)vo\ n P and Lv N (f) = 51 ™p(f)vo\ n P. 4.7.3 
Pev N p^Pn 

Theorem 4.7.5. Let X C K n be a bounded subset , and Vn be a nested 
partition of X. If the boundary dX satisfies vol n (dX) = 0, and f : R n — » R 
is integrable, then the limits 

lira U VN (f) and lim Lp N (f) 4.7.4 

N—*oo /V— .oo 

both exist, and are equal to 

f /(x) |<f*x|. 4.7.5 

Jx 

The theorem is proved in Appendix A. 14. 

8 Determinants 


In higher dimensions the deter- 
minant is important because it has 
a geometric interpretation, as a 
signed volume. 


The determinant is a function of square matrices. In Section 1.4 we introduced 
determinants of 2 x 2 and 3x3 matrices, and saw that they have a geometric 
interpretation: the first gives the area of the parallelogram spanned by two vec- 
tors; the second gives the volume of the parallelepiped spanned by three vectors. 
In higher dimensions the determinant also has a geometric interpretation, as a 
signed volume ; it is this that makes the determinant important. 
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We will use determinants heav- 
ily throughout the remainder of 
the book: forms, to be discussed 
in Chapter 6, are built, on the de- 
terminant. 


As we did for the determinant 
of a 2 x 2 or 3 x 3 matrix, we 
will think of the determinant as a 
function of n vectors rather than 
as a function of a matrix. This 
is a minor point, since whenever 
you have n vectors in M n , you can 
always place them side by side to 
make an n x n matrix. 


Once matrices are bigger than 3x3, the formulas for computing the de- 
terminant are far too messy for hand computation — too time-consuming even 
for computers, once a matrix is even moderately large. We will see (Equation 
4.8.21) that the determinant can be computed much more reasonably by row 
(or column) reduction. 

In order to obtain the volume interpretation most readily, we shall define the 
determinant by the three properties that characterize it. 


Definition 4.8.1 (The determinant). The determinant 

j j j 

*1> *2> • • • > £n 

L I I 


det A — det 


= det(a!,a 2 ,... ,an) 4.8.1 


is the unique real-valued function of n vectors in M n with the following prop- 
erties: 


(1) Multilinearity : det A is linear with respect to each of its arguments. 
That is, if one of the arguments (one of the vectors) can be written 


The properties of multilinearity 
and antisymmetry will come up 
often in Chapter 6. 


a* = otH + 0w, 4.8.2 

then 

det (ai , . . . ,aj_i,(au + /?w),aj +1 ,. .. ,a n ) 

& det(ai , . . . , a,-i, d, aj+i , . . . , a,j) 4.8.3 

"b 0 det(aj , . . . , aj_i, w,a,‘ + i, . . . , an). 

(2) Antisymmetry: det A is antisymmetric. Exchanging any two argu- 
ments changes its sign: 


det(ai,... ,5^,... ,fin) det(ai,... ,5^,... ,a<,... ,a„). 4.8.4 


More generally, normalization 
means “setting the scale.” For 
example, physicists may normal- 
ize units to make the speed of 
light 1. Normalizing the determi- 
nant means setting the scale for n- 
dimensional volume: deciding that 
the unit “n-cube” has volume 1. 


(3) Normalization: the determinant of the identity matrix is 1, i.e., 

det/ = det(e 1 ,e 2 ,... > §») = 1, 4.8.5 

where ei . . . e n are the standard basis vectors. 


Example 4.8.2 (Properties of the determinant). (1) Multilinearity, if 
ot = “1, (5 — 2, and 


— • 

u = 

T 

0 

, w = 

'2 

2 


1 


3 


*-r 


‘4' 


'3' 

0 

- 1 - 

4 

= 

4 

-i 


6 


5 


, so that qu -f pw = 
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Remark 4.8.3. Exercise 4.8.4 ex- 
plores some immediate conse- 
quences of Definition 4.8.1: if a 
matrix has a column of zeroes, or 
if it has two identical columns, its 
determinant is 0. 


then 



r i 

3 

3' 


'1 

1 

3 

det 

2 

4 

1 

= -1 det 

2 

0 

1 


0 

• 

5 

1 


0 

1 

1 


23 


+ 2det 


12 3 
2 2 1 
0 3 1 


— 1 x3=-3 


v “ 

2x13=26 


4.8.6 


as you can check using Definition 1.4.15. 


(2) Antisymmetry: 



det 


Normalization: 

det 


"1 

3 

3' 


"1 3 

3 



2 

4 

1 

= - det 

2 1 

4 

• 

4.8.7 

0 

5 

1 

✓ ^ 

0 1 

5 

✓ 


V 

23 



-23 




"i 

0 

0 






0 

1 

0 

= 1(0 X 

l)-0) 

— 

1 . 

4.8.8 

0 

0 

1 







Our examples are limited to 3 x 3 matrices because we haven’t shown yet 
how to compute larger ones. A 

In order to see that Definition 4.8.1 is reasonable, we will want the following 
theorem: 

Theorem 4.8.4 (Existence and uniqueness of the determinant). 

There exists a function det A satisfying the three properties of the deter- 
minant, and it is unique. 

The proofs of existence and uniqueness are quite different, with a somewhat 
lengthy but necessary construction for each. The outline for the proof is as 
follows: 

First we shall use a computer program to construct a function D(A) 
by a process called “development according to the first column.” Of 
course this could be developed differently, e.g., according to the first 
row, but you can show in Exercise 4.8.13 that the result is equivalent 
to this definition. Then (in Appendix A. 15) we shall prove that D(A) 
satisfies the properties of det A, thus establishing existence of a function 
that satisfies the definition of determinant. 

Finally we shall proceed by “column operations” to evaluate this func- 
tion D(A) and show that it is unique, which will prove uniqueness of the 
determinant. This will simultaneously give an effective algorithm for 
computing determinants. 
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Development according to the first column. Consider the function 

D(A) = ^(-l) I+, Oi.|D(i4i.i). 4.8.9 

l-l 

where A is an n x n matrix and Aij is the (n — 1) x (n — 1) matrix obtained 
from A by erasing the ?th row and t He jth column, as illustrated by Example 
4.8.5. The formula may look unfriendly, but it's not really complicated. As 
shown in Equation 4.8.10, each term of the sum is the product of the entry a iA 
and D of the new. smaller matrix obtained from A by erasing the fth row and 
the first column: the (-l) 1+ ‘ simply assigns a sign to the term. 

n 

d(a)=yi ■ 4 - 8 - 10 

^ tells whether product of u, i and 
+ or - £) 0 f sma [i er matrix 


For this to work we must say 
that the D of a l x 1 ‘'matrix. v 
i.e., a number, is the number itself. 
For example. dot. (7) = 7. 


Our candidate determinant D is thus recursive: D of ail n x n matrix is the 
sum of n terms, each involving D’s of (« - 1) x (n — 1) matrices; in turn, the 
D of each (n — 1) x (n - 1) matrix is the sum of (n - 1) terms, each involving 
D ' s of (n - 2) x (n — 2) matrices .... (Of course, when one deletes the first 
column of the (n - 1) x (n - 1) matrix, it is the second column of the original 
matrix, and so on.) 


Example 4.8.5 (The function D(A)). If 


A = 


13 4 
0 1 1 
1 2 0 


, then A 2 .i = 


1 3 4 



{ -4 l 


"3 4 

| 2 0 
m • 


2 0 


4.8.11 


and Equation 4.8.9 corresponds to 


lD ( 

1 

f 

)-od( 

[3 41 
2 Oj 

) +,D ( 

3 4 

1 1 

) 

\ 

2 

0 

) 



__ 







4.8.12 


i=l t=2 i—3 

The first term is positive because when i — 1, then 1 + / = 2 and we have 
(-1) 2 = 1; the second is negative, because (-1) 3 = -1, and so on. 

Applying Equation 4.8.9 to each of these 2x2 matrices gives: 

1 1 


D 

D 

D 


( 

([ 

( 


2 0 

3 4 
2 0 

3 4 
1 1 


j = ID (() ) - 2D(l) = 0 - 2 = -2; 
) = 3£>(0) - 2£>(4) = -8; 

) =3Z?(1)-U?(4) = -1, 


4.8.13 


so that D of our original 3x3 matrix is l(-2) - 0 + 1(— 1) = -3. 


A 
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The Pascal program Determinant , in Appendix B.3, implements the devel- 
opment of the determinant according to the first column. It will compute D(A) 
for any square matrix of side at most 10; it will run on a personal computer and 
in 1998 would compute the determinant of a 10 x 10 matrix in half a second. 9 

Please note that this program is very time consuming. Suppose that the func- 
tion D takes time T(k) to compute the determinant of a k x k matrix. Then, 
since it makes k “calls” of D for a (k - 1) x (k - 1) matrix, as well as k multi- 
plications, k - 1 additions, and k calls of the subroutine “erase,” we see that 


T(k) > kT(k - 1), 


4.8.14 


This program emhodies the re- 
cursive nature of the determinant 
as defined above: the key point is 
that the function D calls itself. It 
would be quite a bit more difficult 
to write this program in Fortran 
or Basic, which do not allow that 
sort of thing. 


so that T(k) > k\ T( 1). In 1998, on a fast personal computer, one floating point 
operation took about 2 x 10“ 9 second. The time to compute determinants by 
this method is at least the factorial of the size of the matrix. For a 15x 15 matrix, 
this means 15! % 1.3 x 10 12 calls or operations, which translates into roughly 
45 minutes. And 15 x 15 is not a big matrix; engineers modeling bridges or 
airplanes and economists modeling a large company routinely use matrices that 
are more than 1 000 x 1 000. So if this program were the only way to compute 
determinants, they would be of theoretical interest only. But as we shall soon 
show, determinants can also be computed by row or column reduction, which 
is immensely more efficient when the matrix is even moderately large. 

However, the construction of the function D(A ) is most convenient in proving 
existence in Theorem 4.8.4. 


The number of operations that 
would be needed to compute the 
determinant of a 40 x 40 matrix 
using development by the first col- 
umn is bigger than the number of 
seconds that have elapsed since the 
beginning of the universe. In fact, 
bigger than the number of bil- 
lionths of seconds that have 
elapsed: if you had set a computer 
computing the determinant back 
in the days of the dinosaurs, it 
would have barely begun 

The effective way to compute 
determinants is by column opera- 
tions. 


Proving the existence and uniqueness of the determinant 

We prove existence by verifying that the function D{A) does indeed satisfy 
properties (1), (2), and (3) for the determinant det A. This is a messy and 
uninspiring exercise in the use of induction, and we have relegated it to Appen- 
dix A.15. 

Of course, there might be other functions satisfying those properties, but we 
will now show that in the course of row reducing (or rather column reducing) 
a matrix, we simultaneously compute the determinant. Column reduction of 
an n x n matrix takes about n 3 operations. For a 40 x 40 matrix, this means 

64 000 operations, which would take a reasonably fast computer much less than 
one second. 

At the same time this algorithm proves uniqueness, since, by Theorem 2.1.8, 
given any matrix A, there exists a unique matrix A in echelon form that can 
be obtained from A by row operations. Our discussion will use only properties 
(1), (2), and (3), without the function D(A). 

We saw in Section 2.1 that a column operation is equivalent to multiplying 
a matrix on the right by an elementary matrix. 

9 In about 1990 ’ the same computation took about an hour; in 1996, about a minute. 
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As mentioned earlier, we will 
use column operations (Definition 
2.1.11) rather than row operations 
in our construction, because we 
defined the determinant as a func- 
tion of the n column vectors. This 
convention makes the interpreta- 
tion in terms of volumes simpler, 
and in any case you will be able to 
Show in Exercise 4.8.13 that row 
operations could have been used 
just as well. 


Let us check how each of the three column operations affect the determinant. 
It turns out that each multiplies the determinant by an appropriate factor p: 

(1) Multiply a column through by a number m ^ 0 (multiplying by a type 
1 elementary matrix). Clearly, by multilinearity (property (1) above), 
this has the effect of multiplying the determinant by the same number, 
so 


p = m. 4.8.15 

(2) Add a multiple of one column onto another (multiplying by a type 2 ele- 
mentary matrix). By property (1), this does not change the determinant, 
because 

det (&!,... , (ftj -j- /?aj), . . . , a„) 4.8. 16 

det (&i , . . . , , s.j , . . . , a„ ) -f- ft det (ai, . . . , a,,... , a^,... , a n : . 

s „ ✓ 

= 0 because 2 identical terms a* 

The second term on the right is zero: two columns are equal (Exercise 
4.8.4 b). Therefore 


V = L 4.8.17 

(3) Exchange two columns (multiplying by a type 3 elementary matrix). By 
antisymmetry, this changes the sign of the determinant, so 

P = ~T. 4.8.18 


Any square matrix can be column reduced until at the end, you either get the 
identity, or you get a matrix with a column of zeroes. A sequence of matrices 
resulting from column operations can be denoted as follows, with the multipliers 
pi of the corresponding determinants on top of arrows for each operation: 


A 




/*n- l 


A n —\ 



4.8.19 


with A n in column echelon form. Then, working backwards, 


det A n ~ i — — det A n ’, 


det A n -2 — det A n ; 

Pn— 1 Pn 


det A — 


1 


• • • Pn- I p n 


det A n . 


4.8.20 
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Equation 4.8.21 is the formula 
that is really used to compute de- 
terminants. 


Therefore: 

(1) If A n = /, then by property (3) we have det A n = 1 , so by Equation 
4.8.20, 


det A • 4.8.21 

(2) If A n ^ /, then by property (1) we have det A n = 0 (see Exercise 4.8.4), 
so 


det A = 0. 


4.8.22 


You may object that a differ- 
ent sequence of column operations 
might lead to a different sequence 
of with a different product. If 
that were the case, it would show 
that the axioms for the determi- 
nant were inconsistent; we know 
they are consistent because of the 
existence part of Theorem 4.8.4, 
proved in Appendix A. 15. 


Proof of uniqueness of determinant. Suppose we have another function, 
D\(A), which obeys properties (1), (2), and (3). Then for any matrix A, 

D (A) = det A„ = D(A)\ 4.8.23 

i.e., D\ = D. □ 

Theorems relating matrices and determinants 

In this subsection we group several useful theorems that relate matrices and 
their determinants. 


Theorem 4.8.0. A matrix A is invertible if and only if its determinant is 
not zero. 


Proof. This follows immediately from the column-reduction algorithm and 
the uniqueness proof, since along the way we showed, in Equations 4.8.21 and 
4.8.22, that a square matrix has a nonzero determinant if and only if it can be 
column-reduced to the identity. We know from Theorem 2.3.2 that a matrix is 
invertible if and only if it can be row reduced to the identity; the same argument 
applies to column reduction. □ 


A definition that defines an ob- 
ject or operation by its properties 
is called an axiomatic definition. 
The proof of Theorem 4.8.7 should 
convince you that this can be a 
fruitful approach. Imagine trying 
to prove 

D(A)D(B) = D(AB) 
from the recursive definition. 


Now we come to a key property of the determinant, for which we will see a 
geometric interpretation later. It was in order to prove this theorem that we 
defined the determinant by its properties. 

Theorem 4.8.7. If A and B are n x n matrices, then 

det A det B = det(AB). 4.8.24 


Proof, (a) The serious case is the one in which A is invertible. If A is invertible, 
consider the function 


/(B) = 


det (AB) 
det A ' 


4.8.25 
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As you can readily check (Exercise 4.8.5), it has the properties (1), (2), and (3), 
which characterize the determinant function. Since the determinant is uniquely 
characterized by those properties, then f(B) = det B. 

(b) The case where A is not invertible is easy, using what we know about 
images and dimensions of linear transformations. If A is not invertible, det A 
is zero (Theorem 4.8.6), so the left-hand side of the theorem is zero. The right- 
hand side must be zero also: since A is not invertible, rank A < n. Since 
Img(AB) c ImgA, then rank(AB) < rank A < n, so AB is not invertible 
either, and det AB = 0. □ 

Theorem 4.8.7, combined with Equations 4.8.15, 4.8.18, and 4.8.17, give the 
following determinants for elementary matrices. 

Theorem 4.8.8. The determinant of an elementary matrix equals the de- 
The determinant of this type 1 terminant of its transpose: 
elementary matrix is 2. — 

det E = det E T . 4.8.26 


10 0 0 
0 10 0 
0 0 2 0 
0 0 0 1 


1 0 -3 
0 1 0 
0 0 1 

The determinant of all type 2 ele- 
mentary matrices is 1. 


Corollary 4.8.9 (Determinants of elementary matrices). The deter- 
minant of a type 1 elementary matrix is m, where m ^ 0 is the entry on 
the diagonal not required to be 1. The determinant of a type 2 elementary 
matrix is 1, and that of a type 3 elementary matrix is - 1 : 


0 1 0 ' 
1 0 0 
0 0 1 


det Ei (i, m) — m 
det E 2 (i y j,x)= 1 
det Ea(i y j) = -1. 


The determinant of all type 3 ele- 
mentary matrices is -1. 


Proof. The three types of elementary matrices are described in Definition 
2.3.5. For the first type and the third types, E = E r , so there is nothing to 
prove. For the second type, all the entries on the main diagonal are 1, and all 
other entries are 0 except for one, which is nonzero. Call that nonzero entry, 
in the ith row and jr th column, a. We can get rid of a by multiplying the ith 
column by —a and adding the result to the jfth column, creating a new matrix 
E' = /, as shown in the example below, where i — 2 and j = 3. 


If 


-looo- 

0 1 a 0 
0 0 10 
.0 0 0 1 . 


then 


4.8.27 



-0- 


- 0- 


-0- 


- o- 


-o- 

a x 

1 

0 

= 

-a 

0 

; and 

a 

1 

+ 

-a 

0 

— 

0 

1 


.0. 


. 0 . 


- 0 . 


. 0 . 


.0. 

ith 

column 


j th 

column 





4.8.28 
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One easy consequence of Theo- 
rem 4.8.10 is that a matrix with a 
row of zeroes has determinant 0. 


The fact that determinants are 
numbers, and that therefore mul- 
tiplication of determinants is com- 
mutative, is much of the point 
of determinants; essentially every- 
thing having to do with matri- 
ces that does not involve non- 
commuta- 
tivity can be done using determi- 
nants. 


Recall Corollary 2.5.13: A ma- 
trix A and its transpose A T have 
the same rank. 


We know (Equation 4.8.16) that adding a multiple of one column onto another 
does not change the determinant, so detE = det /. By property (3) of the 
determinant (Equation 4.8.5), det / = 1, so det E = det I. The transpose E J is 
identical to E except that instead of a tJ we have a jyi \ by the argument above, 
det E T = det 7. □ 

We are finally in a position to prove the following result. 

Theorem 4.8.10. For any n x n matrix A, 

det A = det A J . 4.8.29 

Proof. Column reducing a matrix A to echelon form A is the same as multi- 
plying it on the right by a succession of elementary matrices E\ . . . E*: 

A = A{Ei...E k ). 4.8.30 

By Theorem 1.2.17, (AB) J = B T A T , so 

A T = (E l ...E k ) T A T . 4.8.31 

We need to consider two cases. 

First, suppose 4 = 7, the identity. Then A = 7, and 

A = E- 1 ...Et' and A T = (E* 1 .. • Ef 1 ) 1 ’ = (E,~ 1 ) T (E*‘) T , 4.8.32 

SO 

det A = det (E* 1 ...Ef *) = detE fc _1 ...detEj" 1 ; 

det^ T =det((£r 1 ) T ...(E*- 1 ) T ) 4 8 33 

= det(E"" 1 )^ . . ,det(E _1 )J detEf 1 . . .detE^ 1 . 

Theorem 

4.8.8 

A determinant is a number, not a matrix, so multiplication of determinants is 
commutative: det Ef 1 ... det Ej ~ 1 = det E^ 1 ... det Ef 1 . This gives us det A — 
det A T . 

If A 7 ^ 7, then rank A < n, so rank A r < n, so det A = det A J = 0. □ 

One important consequence of Theorem 4.8.10 is that throughout this text, 
whenever we spoke of column operations, we could just as well have spoken of 
row operations. 

Some matrices have a determinant that is easy to compute: the triangular 
matrices (See Definition 1.2.19). 


Theorem 4.8.11. If a matrix is triangular, then its determinant is the prod- 
uct of the entries along the diagonal. 
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An alternative proof is sketched 
in Exercise 4.8.6. 


Here are some more character- 
izations of invertible matrices. 


The proof of Theorem 4.8.14 is 
left to the reader as Exercise 4.8.7. 


Proof. We will prove the result for upper triangular matrices; the result for 
lower triangular matrices then follows from Theorem 4.8.10. The proof is by 
induction. Theorem 4.8.11 is clearly true for a 1 x 1 triangular matrix (note 
that any lxl matrix is triangular). If A is triangular of size n x n with 
n > 1, the submatrix Au (A with its first row and first column removed) is 
also triangular, of size (n - 1) x (» - 1), so we may assume by induction that 

det A i , i = a 2> 2 • • • an.n • 4.8.34 

Since au is the only nonzero entry in the first column, development according 
to the first column gives: 

det A — (-l) 2 ai i i det Aj.i = fli,ifl 2.2 • o u>7l . □ 4.8.35 

Theorem 4.8.12. If a matrix A is invertible , then 

det A~' = -r. 4.8.36 

det A 

Proof. This is a simple consequence of Theorem 4.8.7: 

det A det A" 1 — det (AA" 1 ) = det I = 1. □ 4.8.37 

The following theorem acquires its real significance in the context of abstract 
vector spaces, but we will find it useful in proving Corollary 4.8.22. 

Theorem 4.8.13. The determinant function is basis independent: if P is the 
change-of-basis matrix , then 

det A = det(P~ 1 AP) . 4.8.38 

Proof. This follows immediately from Theorems 4.8.7 and 4.8.12. □ 

Theorem 4.8.14. If A is an n x n matrix and B is an m x m matrix, then 
for the ( n + m) x (n + m) matrix formed with these as diagonal elements, 

det [o j 5 j = det A det B. 4.8.39 


The signature of a permutation 

Some treatments of the determinant start out with the signature of a permuta- 
tion, and proceed to define the determinant by Equation 4.8.46. We approached 
the problem differently because we wanted to emphasize the effect of row oper- 
ations on the determinant, which is easier using our approach. 
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RecaJl that a permutation of 
{l,...,n} is a one to one map 
a : {l,...,n} — ► {l,...,n}. One 
permutation of {1,2,3} is {2, 1,3}; 
another is (2, 3, 1}. There are sev- 
eral ways of denoting a permuta- 
tion; the permutation that maps 
1 to 2, 2 to 3, and 3 to 1 can be 
denoted 


T 


'2 

2 

1 — ► 

3 

3 


1 



m m 


Permutations can be composed: if 


V 


’2 

2 

1 — » 

1 

3 

■ «■ 


3 

V 


2 

2 

V— ♦ 

3 

.3. 


1 

m • 


then we have 


and 



V 


'3' 

too : 

2 


2 


.3J 


1 

* « 


T 


T 

0 or : 

2 

1 — ► 

3 


3_ 


2 

J 


and 


We see that a permutation ma- 
trix acts on any element of by 
permuting its coordinates. 

In the language of group the- 
ory, the transformation that asso- 
ciates to a permutation its matrix 
is called a group homomorphism. 


There are a great many possible definitions of the signature of a permutation, 
ail a bit unsatisfactory. 

One definition is to write the permutation as a product of transpositions , a 
transposition being a permutation in which exactly two elements are exchanged. 
Then the signature is + 1 if the number of transpositions is even, and -1 if it is 
odd. The problem with this definition is that there are a great many different 
ways to write a permutation as a product of transpositions, and it isn’t clear 
that they all give the same signature. 

Indeed, showing that different ways of writing a permutation as a product of 
transpositions all give the same signature involves something like the existence 
part of Theorem 4.8.4; that proof, in Appendix A. 15, is distinctly unpleasant. 
But armed with this result, we can get the signature almost for free. 

First, observe that we can associate to any permutation a of {1, . . . ,n} its 
permutation matrix by the rule 

(M a )ei = e a{ i). 4.8.40 


Example 4.8.15 (Permutation matrix). Suppose we have a permutation 
a such that ar(\) = 2>o(2) = 3, and a(3) = 1, which we may write 


V 


'2' 

2 


3 

3 


1 

» m 


or simply (2,3,1). 


This permutation puts the first coordinate in second place, the second in third 
place, and the third in first place, not the first coordinate in third place, the 
second in first place, and the third in second place. 

The first column of the permutation matrix is Af^ei = e a(1 ) = e 2 . Similarly, 
the second column is 03 and the third col umn is ei : 



'0 0 r 

1 0 0 

0 1 0 
t _ . 


4.8.41 


You can easily confirm that this matrix puts the first coordinate of a vector 
in R 3 into second position, the second coordinate into third position, and the 
third coordinate into first position: 



4.8.42 


Exercise 4.8.9 asks you to check that the transformation a »-> M a that asso- 
ciates to a permutation its matrix satisfies M oor = M a M T . 

The determinant of such a permutation matrix is obviously ±1, since by 
exchanging rows repeatedly it can be turned into the identity matrix; each time 
two rows are exchanged, the sign of the determinant changes. 
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Definition 4.8.16 (Signature of a permutation). The signature of a 
permutation a , denoted sgn(<7), is defined by 

sgn('r) = det M„. 4.8.43 


Some authors denote the signa- Permutations of signature +1 are called even permutations . and permuta- 
ture (-1)". tions of signature - 1 are called odd permutations. Almost all properties of the 

signature follow immediately from the properties of the determinant; we will 
explore them at some length in the exercises. 


Remember that by 
<t 3 = (3,1.2) 


Example 4.8.17 (Signatures of permutations). There are six permuta- 
tions of the numbers 1, 2, 3: 


wc mean the permutation such 



1 


3 

that 

2 

1 — 

1 


3 


2 


<7, =(1.2,3), <7 2 = (2.3.1), <t 3 = (3,1,2) 
<7, = (1.3,2), <75 = (2.1.3), <76 = (3,2.1). 


det 


1 0 0 
0 1 0 
0 0 1 


- Odet 


0 0 
0 1 


= 1 det 


+0det 


1 0 
0 1 

0 0 
1 0 


= 1 


The first three permutations are even; the last three are odd. We gave the 
permutation matrix for <72 in Example 4.8.15; its determinant is +1. Here are 
three more: 


det M 0l — det 


1 0 0 
0 1 0 
0 0 1 


= + 1 , 


det = det 


det = det 


0 1 0 
0 0 1 
1 0 0 


TOO 
0 0 1 
0 1 0 


= - 1 . 


= +l, 
4.8.45 


Exercise 4.8.10 asks you to verify the signature of <75 and cr 6 . A 


Remark. In practice signatures aren’t computed by computing the permu- 
tation matrix. If a signature is a composition of k transpositions, then the 
signature is positive if k is even and negative if k is odd, since each trans- 
position corresponds to exchanging two columns of the permutation matrix, 
and hence changes the sign of the determinant. The second permutation of 
Example 4.8.17 has positive signature because two transpositions are required: 
exchanging 1 and 3, then exchanging 3 and 2 (or first exchanging 3 and 2, and 
then exchanging 1 and 3). A 

We can now state one more formula for the determinant. 


Theorem 4.8.18. Let A beannxn matrix with entries denoted ( ai j). Then 


detA= Y, sgn((7)Q I<<T (,) 

<r€Perm(l,...,n) 


• ®n,<y(n) • 


4.8.46 
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Each term of the sum in Equa- 
tion 4.8.46 is the product of n en- 
tries of the matrix A , chosen so 
that there is exactly one from each 
row and one from each column; no 
two are from the same column or 
the same row. These products are 
then added together, with an ap- 
propriate sign. 


In Equation 4.8.46 we are summing over each permutation a of the numbers 
If n — 3, there will be six such permutations, as shown in Example 
4.8.17. For each permutation a, we see what a does to the numbers 1, . . . , n, 
and use the result as the second index of the matrix entries. For instance, if 
cr(l) = 2, then ai )(7 (i) is the entry of the matrix A. 

Example 4.8.19 (Computing the determinant by permutations). Let 
n = 3, and let A be the matrix 


Then we have 


A = 


1 2 3 
4 5 6 
7 8 9 


4.8.47 


<ti = (123) 

+ 

a l,l a 2,2 a 3,3 = 1-5*9 = 45 

<7 2 = (231) 

+ 

a i,2 a 2,3 a 3,i = 2 • 6 • 7 = 84 

<73 = (312) 

+ 

a i,3 a 2,i a 3,2 = 3-4*8= 96 

<r 4 = (132) 

— 

a l,l a 2,3 a 3,2 = 1*6*8= 48 

<75 = (213) 

— 

a i,2 a 2,i a 3,3 = 2*4-9= 72 

<7 6 = (321) 


a 1.3 a 2.2 a 3,l =3-5*7= 105 


So det A — 45+84+96-42- 72 - 105 = 0. Can you see why this determinant 
had to be O? 10 A 


In Example 4.8.19 it would be quicker to compute the determinant directly, 
using Definition 1.4.15. Theorem 4.8.18 does not provide an effective algorithm 
for computing determinants; for 2 x 2 and 3 x 3 matrices, which are standard in 
the classroom (but not anywhere else), we have explicit and manageable formu- 
las. When they are large, column reduction (Equation 4.8.21) is immeasurably 
faster: for a 30 x 30 matrix, roughly the difference between one second and the 
age of the universe. 


Proof of Theorem 4.8.18. So as not to prejudice the issue, let us temporarily 
call the function of Theorem 4.8.18 D(A): 

D ( A )= Y, sgn(<r)oi iW i ) ...a n ,» ( „ ) . 4.8.48 

<7€Perm(l,...,n) 

We will show that the function D has the three properties that characterize the 
determinant. Normalization is satisfied: D(I) = 1, since if a is not the iden- 
tity, the corresponding product is 0, so the sum above amounts to multiplying 


10 Denote by ai , £2, S3 the columns of A. Then £3 - £2 = 


and £2 - ai — 


So ai -2a 2 +£3 = 0; the columns are linearly dependent, so the matrix is not invertible 
and its determinant is 0. 


11 
1 
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together the entries on the diagonal, which are all 1, and assigning the product 
the signature of the identity, which is 4-1. Multilinearity is straightforward: 
each term a 1<y( i) . . . a n M ( n ) is multilinear as a function of the columns, so any 
linear combination of such terms is also multilinear. 

Now let's discuss antisymmetry. Let i ^ j be the indices of two columns of 
an n x n matrix A , and let r be the permutation of {1. ... ,n} that exchanges 
them and leaves all the others where they are. Further, denote by A' the matrix 
formed by exchanging the ith and jth columns of A. Then Equation 4.8.46, 
applied to the matrix A', gives 


D(A') = Y Sg'^K.Td) • • • a !>.<x(n) 
<r6Pfrin(l....,n) 


^ sgn(cr)a 1 . . . a„, TOor ( n ), 

or€Perm(l n) 


4.8.49 


since the entry of A ' in position ( k , l) is the same as the entry of A in position 

(fc,T(0). 

As a runs through all permutations, o' = tog does too, so we might as well 
write 


D(A')= $gn(r 1 o<T / )a li ^ 1) ...a n y (n) , 4.8.50 

<r'6Perm(I , — n) 


The trace of 

'10 3 

1 2 1 

.0 1 -1 

is 1 4- 2 4- (-1) = 2. 


and the result follows from sgn(r 1 ocr') = sgn(r 1 )(sgn(cr)) = - sgn(cr), since 
sgn(r) = sgn(r -1 ) = -1. □ 

The trace and the derivative of the determinant 

Another interesting function of a square matrix is its trace, denoted tr. 


Using sum notation, Equation 
4.8.51 is 

n 

trA s= ^ q, . 

*=i 


Definition 4.8.20 (The trace of a matrix). The trace ofanxn matrix 
A is the sum of its diagonal elements: 


trA — ai f i 4“ <* 2,2 4- • • • 4- a n , n . 


4.8.51 


The trace is easy to compute, much easier than the determinant, and it is a 
linear function of A: 


tr(aA 4- bB) = a tr A 4- b tr B. 4.8.52 

The trace doesn’t look as if it has anything to do with the determinant, but 
Theorem 4.8.21 shows that they are closely related. 
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Note that in Equation 4.8.53, 
(Ddet(/)] is the derivative of the 
determinant function evaluated at 
/. (It should not be read as the 
derivative of det (I), which is 0, 
since det(J) = 1).) In other words, 
[Ddet(/)] is a linear transforma- 
tion from Mat (n, n) to R. 

Part (b) is a special case of part 
(c), but it is interesting in its own 
right. We will prove it first, so we 
state it separately. Computing the 
derivative when A is not invertible 
is a bit trickier, and is explored in 
Exercise 4.8.14. 


Theorem 4.8.21 (Derivative of the determinant). (a) The determinant 
function det : Mat (n, n) — ► R is differentiable. 

(b) The derivative of the determinant at the identity is given by 

[D det( J)]B = trB. 4.8.53 

(c) If det A ^ 0, then [Ddet(A)]B = det A tr(A“ l B). 

Proof, (a) By Theorem 4.8.18, the determinant is a polynomial in the entries 
of the matrix, hence certainly differentiable. (For instance, the formula ad — be 
is a polynomial in the variables a, 6, c, d .) 

(b) It is enough to compute directional derivatives, i.e., to evaluate the limit 

lim det(/ - +fe f ) - - det - / ; 4.8.54 

/*—o h 


or put another way, to find the terms of 




‘1 -1- hb\\ 

hb lt 2 • • • 

hb hn * 


det (7 -1- hB) = det 

hb 2 ,i 

• 

1 + 6^2,2 • • • 

• . 

662, n 
• 

TYy the 2 x 2 case of Equation 


• 

• 

• • 

• • 

• 

• 

4 . 8 . 55 : 


• hbn, 1 

hb n , 2 ... 

1 + hb ntn , 


det 



a 

c 



_ 1 -f -ha hb 1 

he 1 -f hd J 

= (1 + ha)( 1 + hd) - h 2 bc 
= l + h(a + d) + h 2 (ad-bc). 


which are linear in h. Equation 4.8.46 shows that if a term has one factor off 
the diagonal, then it must have at least two (as illustrated for the 2 x 2 case in 
the margin): a permutation that permutes all symbols but one to themselves 
must take the last symbol to itself also, as it has no other place to go. But all 
terms off the diagonal contain a factor of h t so only the term corresponding to 
the identity permutation can contribute any linear terms in h. 

The term corresponding to the identity permutation, which has signature 
+1, is 


(1 + hb l} i)(l + hb 2,2) ... (1 + hb nfn ) 

4.8.56 

= 1 + + 62,2 H h &n,n) h J* n &l,l&2,2 • • • 6n,n> 

and we see that the linear term is exactly 61,1-1- 62,2 H 1- 6 n ,n = tr B. 

(c) Again, take directional derivatives: 


lim 
/i— *0 


det(A + hB) - det A 
h 


= lim 
/»— 0 


deHAil + hA-'Bfi-detA 

h 


^ det A det (I + hA~ 1 B) - det A 
h-> o h 


= det A lim jfjj / + ^ A ~' B ) ~ 1 
h-> o h 

= detA tr(i4 _1 B). □ 


4.8.57 
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Equation 4.8.58 looks like 
Equation 4.8.38 from Theorem 
4.8.13, but it is not true for the 
same reason. Theorem 4.8.13 fol- 
lows immediately from Theorem 
4.8.7: 


det{AB) — det A det B. 


Theorem 4.8.21 allows easy proofs of many properties of the trace which are 
not at all obvious from the definition. 

Corollary 4.8.22. If P is invertible, then for any matrix A we have 

tr (P~ 1 AP) = tr A. 4.8.58 

Proof. This follows from the corresponding result for the determinant (The- 
orem 4.8.13): 


This is not true for the trace: 
the trace of a product is not the 
product of the traces. Corollary 
4.8.22 is usually proved by show- 
ing first that tr AB = tr BA. Ex- 
ercise 4.8.11 asks you to prove 
tr AB = tr BA algebraically; Ex- 
ercise 4.8.12 asks you to prove it 
using 4.8.22. 


U(P-'AP) = lim 
= lim 

h — *0 

= liin 
h -*0 

= lim 

h—0 


det(/ -I- hP ~ 1 AP) - det / 
h 

det(P“ ! (/ -I- hA)P) — det I 
h 

det(P“ 1 ) det (/ 4- hA) det P) - det / 

h 

det(I + hA)~ det I =trA Q 


4.8.59 


4.9 Volumes and determinants 

Recall that “pavable” means 

“having a well-defined volume,” as 1° section, we will show that in all dimensions the determinant measures 
stated in Definition 4.1.14. volumes. This generalizes Propositions 1.4.14 and 1.4.20, which concern the 

determinant in *R 2 and R 3 . 

2 - 


T (A) 

1 


A 


; i 2 

Figure 4.9.1. 

The transformation given by 

j turns the square with side 

length 1 into the square with side 
length 2. The area of the first is 1; 
the area of the second is |det(T]| 
times 1; i.e., 4. 


2 0 
.° 2 


Theorem 4.9.1 (The determinant measures volume). Let T : R n — ► 
U n be a linear transformation given by the matrix [T]. Then for any pavable 
set Ac W 1 , its image T(A) is pavable , and 

vol n T(A) — | det[T]| vol n A. 4.9.1 

The determinant |det[T]| scales the volume of A up or down to get the 
volume of T(A); it measures the ratio of the volume of T(A) to the volume of 
A. 

Remark. A linear transformation T corresponds to multiplication by the 
matrix [TJ. If A is a pavable set, then what does T(A) correspond to in terms 
of matrix multiplication? It can’t be [T}A\ a matrix can only multiply a matrix 
or a vector. Applying T to A corresponds to multiplying each point of A by 
[T]. (To do this of course we write points as vectors.) If for example A is the 
unit square with lower left-hand corner at the origin and T(A) is the square 
with same left-hand corner but side length 2, as shown in Figure 4.9.1, then \T) 
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In Definition 4.9.2, the business 
with U is a precise way of saying 
that a ^-parallelogram is the ob- 
ject spanned by Vi , . . . v*, includ- 
ing its boundary and its inside. 

A ^-dimensional parallelo- 
gram, or A;- parallelogram, is an in- 
terval when k = 1, a parallelogram 
when k = 2, a parallelepiped when 
k = 3, and higher dimensional 
analogs when k > 3. (We first 
used the term ^-parallelepiped; we 
dropped it when one of our daugh- 
ters said “piped” made her think 
of a creature with 3.1415 .. . legs.) 


Anchoring Q at the origin is 
just a convenience; if we cut it 
from its moorings and let it float 
freely in n-dimensional space, it 
will still have n-dimensional vol- 
ume 1, which is what we are inter- 
ested in. 


Note that for T(Vn) to be a 
paving of R n (Definition 4.7.2), 
T must be invertible. The first 
requirement for a paving, that 

Ucev»T(C) = IT, 

is satisfied because T is onto, and 
the second, that no two tiles over- 
lap, is satisfied because T is one to 
one. 


is the matrix 


2 0 
0 2 


; multiplying 


by [T] gives 


1 

1 


, and so on. 


A 



by [T] gives 


2 

0 


, multiplying 


1/2 

1/2 J 


For this section and for Chapter 5 we need to define what we mean by a 
A> dimensional parallelogram, also called a ^-parallelogram. 


Definition 4.9.2 (^-parallelogram). The ^-parallelogram spanned by 

Vi ,...■?* is the set of all Vi -I f- with 0 < £* < 1 for i from 1 to k. 

It is denoted P(?i , . . . v*). 

In the proof of Theorem 4.9.1 we will make use of a special case of the k- 
parallelogram: the n-dimensional unit cube. While the unit disk is traditionally 
centered at the origin, our unit cube has one corner anchored at the origin: 

Definition 4.9.3 (Unit n-dimensional cube). The unit n-dimensional 
cube is the n-dimensional parallelogram spanned by ei , . . . e n . We will denote 
it Q n , or, when there is no ambiguity, Q. 

Note that if we apply a linear transformation T to Q , the resulting T(Q) is 
the n-dimensional parallelogram spanned by the columns of [T]. This is nothing 
more than the fact, illustrated in Example 1.2.5, that the ith column of a matrix 
[T] is (T]ej; if the vectors making up [T] are v 1? . . . v n , this gives v* = [T]ej, 
and we can write T(Q) = P(vi , . . . v n ). 

Proof of Theorem 4.9.1 (The determinant measures volume). If [T] is 
not invertible, the theorem is true because both sides of Equation 4.9.1 vanish: 

vol n T(A) = | det[T]| vol n A. (4.9.1) 

The right side vanishes because det[Tj = 0 when (Tj is not invertible (Theorem 
4.8.6). The left side vanishes because if [T] is not invertible, then T(K n ) is a 
subspace of K n of dimension less than n, and T(A) is a bounded subset of this 
subspace, so (by Proposition 4.3.7) it has n-dimensional volume 0. 

This leaves the case where [T] is invertible. This proof is much more involved. 
We will start by denoting by T(Vn) the paving of R” whose blocks are all the 
T(C) for C € D/v(lR n ). We will need to prove the following statements: 

(1) The sequence of pavings T(V N ) is a nested partition. 

(2) If CeZV(R n ), then 

vol„ T(C) — vol n T(Q ) vol n C. 

(3) If A is pavable, then its image T(A) is pavable, and 

voln T*(>1) = vol„ T(Q) vol„ A. 
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Figure 4.9.2. 

The potato-shaped area at top 
is the set A\ it is mapped by T to 
its image T(A ), at bottom. If C' is 
the small black square in the top 
figure, T{C) is the small black par- 
allelogram in the bottom figure. 
The ensemble of all the T(C') for 
C in Pjs'(IP;") is denoted T(T>\). 
The volume of T(A) is the limit 
of the suin of the volumes of the 
T(C), where C 6 £>*(&") and 
C C A. Each of these has the 
same volume 

vol n T(C) - vol n Cv ol w T(Q). 


(4) vol„r(g) = |det[T)|. 
We will take them in order. 


Lemma 4.9.4. The sequence of pavings T( 2V) is a nested part ition. 

Proof of Lemma 4.9.4. We must check the three conditions of Definition 
4.7.4 of a nested partition. The first condition is that small paving pieces must 
fit inside big paving pieces: if we pave !R N with blocks T(C), then if 

Ci 6 D.v, (DC). C 2 € V N Jlk n ), and Cj C C 2 , 4.9.2 

we have 

T{C\ ) C T(C 2 ). 4.9.3 

'I'his is clearly met: for example, if you divide the square /I of Figure 4.9.1 into 
four smaller squares, the image of each small square will fit inside T(A). 

We use the linearity of T in meeting the second and third conditions. The 
second condition is that the boundary of the sequence of pavings must have 
u-dimensional volume 0. The boundary c?T>,v(!£ n ) is a union of snbspaces of 
dimension n — 1. hence dT(T>^(l R”)) is also. Moreover, only finitely many 
intersect any bounded subset of IR", so (by Corollary 4.3.7) the second condition 
is satisfied. 

The third condition is that the pieces T(C) shrink to points as N — *■ oo. 
This is also met: since 

diain(C) = — ^ when C G T>n{’P"). we have diam(T(C)) < \T\-^jf. 4.9.4 

So diani (T(C)) — 0 as N — * oo. n 

Proof of Theorem 4.9.1: second statement. 

Now for the second statement. Recall that Q is the unit (n-dimensional) 
cube, with n-dimcnsional volume 1. We will now show that T(Q) is pavable, 
as are all T(C) for C f= Since C is Q scaled up or down by 2 N in all 

directions, and T(C) is T(Q) scaled by the same factor, we have 

vol„ T(C) _ vol ri C _ vol n C 
vol n T(Q) vol n Q 1 ? 

which we can write 

vol n T(C) = vol n T(Q) vol rt (C). 

1 1 If this is not clear, consider that for any points a and b in C (which we can think 
of as joined by the vector ■?), 

|T(a) - T(b)| = |T(a - b); = |(rj*| < ||T]||v|. 

Prop 1.4.11 

So the diameter of T{(') can be at most |{7’]| times the length of the longest vector 
joining two points of C: i.e. v /n/2 A '. 


4.9.5 

4.9.6 
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Proof of Theorem 4.9.1: third statement. We know that A is pavable; 
as illustrated in Figure 4.9.2, we can compute its volume by taking the limit of 
the lower sum (the cubes C 6 T> that are entirely inside A) or the limit of the 
upper sum (the cubes either entirely inside A or straddling A). 

Since T(V w) is a nested partition, we can use it as a paving to measure the 
volume of T(A), with upper and lower sums: 


In reading Equation 4.9.7, it’s 
important to pay attention to 
which C’s one is summing over: 

C fl A (£> = C’s in A or 
straddling A 

C C A — C’s entirely 
in A 

Subtracting the second from the 
first gives C’s straddling A. 


upper sum for *t(A) 


= vol„ T(C) by Eq. 4.9.6 


53 vol„ T(C) = £ voi„ (C) vol„ T(Q) = vol n T(Q) £ vol„ C; 

CnA * <& cnA ? ct> 


T(C)rtT(A)* <t> 

V 

limit is vol n T(A) 
lower sum for Xt(A) 


limit is vol n (A) 
limit is vol n (.A) 


53 vol„ T(C) =53 vol„(C) vol„ T(Q) = vol n T(Q) 53 vol „C . 4.9.7 

T(C)CT(A) CCA CCA 


Subtracting the lower sum from the upper sum, we get 


You may recall that in M 2 and 
especially R 3 the proof that the 
determinant measures volume was 
a difficult computation. In R n , 
such a computational proof is out 
of the question. 

Exercise 4.9. 1 suggests a differ- 
ent proof: showing that vol n T(Q) 
satisfies the axiomatic definition of 
the absolute value of determinant, 
Definition 4.8.1. 


What does E(A) mean when 
the set A is defined in geomet- 
ric terms, as above? If you find 
this puzzling, look again at Figure 
4.9.1. We think of E as a transfor- 
mation; applying that transformar 
tion to A means multiplying each 
point of A by E to obtain the cor- 
responding point of E(A). 


Un(X T (a)) - Ln(Xt ( a)) = 53 vol n T(C) 

C straddles 
boundary of A 

= vol„ T(Q) 53 vol., C. 

C straddles 
boundary of A 

Since A is pavable, the right-hand side can be made arbitrarily small, so T(A) 
is also pavable, and 

vol„ T(A) = vol n T(Q) vol„ A. 4.9.8 

Proof of Theorem 4.9.1: fourth statement. This leaves (4): why is 
vol„ T(Q) the same as | det[T]|? There is no obvious relation between volumes 
and the immensely complicated formula for the determinant. Our strategy will 
be to reduce the theorem to the case where T is given by an elementary matrix, 
since the determinant of elementary matrices is straightforward. 

The following lemma is the key to reducing the problem to the case of ele- 
mentary matrices. 

Lemma 4.9.5. If S,T : R n — ♦ R n are linear transformations, then 

vol n(S o T)(Q) = vol n S(Q) VO l n T(Q). 4.9.9 

Proof of Lemma 4.9.5. This follows from Equation 4.9.8, substituting S for 
T and T(Q) for A: 


vol n(S o T)(Q) = voln S(T(Q)) = voln S(Q) voln T(Q). O 


4.9.10 
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The second type of elementary 
matrix, in R 2 , simply takes the 
unit square Q to a parallelogram 
with base 1 and height 1. 


r. 

1 0 

1 

1 


1 ] 
1 

1 

1 

1. 



Figure 4.9.4. 

Here n = 7; from the 7x7 
matrix E we created the 2x2 

matrix E x *= J * * J and the 5 x 5 
identity matrix E?. 


Any invertible linear transformation T, identified to its matrix, can be writ- 
ten as the product of elementary matrices, 

[T 1 ] = EftEk - 1 • • • E \ . 4.9. 1 1 

since [T] row reduces to the identity. So by Lemma 4.9.5, it is enough to prove 
(4) for elementary matrices: i.e., to prove 

vol n E(Q) = |detE|. 4.9.12 


Elementary matrices come in three kinds, as described in Definition 2.3.5. 
(Here we discuss them in terms of columns, as we did in Section 4.8, not in 
terms of rows.) 


(1) If E is a type 1 elementary matrix, multiplying a column by a nonzero 
number m, then det£ =■ m (Corollary 4.8.9), and Equation 4.9.12 be- 
comes vol T , E(Q) = \m\. This result was proved in Proposition 4.1.16, 
because E(Q) is then a parallelepiped all of whose sides are 1 except one 
side, whose length is \m\. 

(2) The case where E is type 2, adding a multiple of one column onto an- 
other, is a bit more complicated. Without loss of generality, we may 
assume that a multiple of the first is being added to the second. 

First let us verify it for the case n = 2 , where E is the matrix 

E = [o j J , with det£ = 1. 4.9.13 

As shown in Figure 4.9.3, the image of the unit cube, £(<?), is then a 
parallelogram still with base 1 and height 1, so vol(E(C?)) = | det E\ = 
l. 12 

If n > 2, write R" = R 2 x R 71-2 . Correspondingly, we can write 

Q “ Qi x Q2> and E = E\ x E 2 , where E? is the identity, as shown in 
Figure 4.9.4. 

Then by Proposition 4.1.12, 


vol n (£(Q)) as vol 2 (Ei(Qi))vol„_ 2 (Q 2 ) = 11 = 1. 4.9.14 

(3) If E is type 3, then |detE| = 1, so that Equation 4.9.12 becomes 

vol n E(Q) = 1. Indeed, since E(Q) is just Q with vertices relabeled, 
its volume is 1. □ 


1 2n 

But is this a proof? Are we using our definition of volume (area in this case) 
using pavings, or some “geometric intuition,” which is right but difficult to justify 
precisely? One rigorous justification uses Fubini’s theorem: 

E( q) = l' (j f°" *' rfx) dy = 1. 

Another possibility is suggested in Exercise 4.9.2. 
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Figure 4.9.5. 

The linear transformation 



takes the unit circle to the ellipse 
shown above. 


Note that after knowing vol n T(Q) = | det[T]|, Equation 4.9.9 becomes 

| det[S]| | det[T]| = | det[ST]|. 4.9.15 

Of course, this was clear from Theorem 4.8.7. But that result did not have a 
very transparent proof, whereas Equation 4.9.9 has a clear geometric meaning. 
Thus this interpretation of the determinant as a volume gives a reason why 
Theorem 4.8.7 should be true. 


Linear change of variables 

It is always more or less equivalent to speak about volumes or to speak about 
integrals; translating Theorem 4.9.1 (“the determinant measures volume”) into 
the language of integrals gives the following theorem. 


Theorem 4.9.6 (Linear change of variables). Let T : R n -* R n be 
an invertible linear transformation , and f : R n — ► R an integr&ble function. 
Then f oT is integr&ble, and 


[ /(y)M"y|,= |detT| f /(T(x))|(Tx| 
J m n 'v- * Jmr > y ^ 


correct* for 
stretching by T 


/( y) 


4.9.16 


where x is the variable of the first R n and y is the variable of the second R n . 


In Equation 4.9.16, |detT| corrects for the distortion induced by T. 


Example 4.9.7 (Linear change of variables). The linear transformation 
given by T = ® transforms the unit circle into an ellipse, as shown in 

Figure 4.9.5. The area of the ellipse is then given by 

Area of ellipse = |<f 2 yl = det “ t\\ W**! = N*. 4.9.17 


06 


w= are* of circle 


If we had integrated some function / : R 2 -+ R over the unit circle and 
wanted to know what the same function would give when integrated over the 
ellipse, we would use the formula 



A 4.9.18 
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Figure 4.9.6. 

— • 

The vectors a, b span a paral- 
lelogram of positive area; the vec- 
tors b and a span a parallelogram 
of negative area. 


Alternatively, we can say that 
just as the volume of T(Q) is 
J det T\, the signed volume of T(Q) 
is det T . 

Of course, “counterclockwise” 
is not a mathematical term; find- 
ing that the determinant of some 
2x2 matrix is positive cannot tell 
you in which direction the arms of 
your clock move. What this re- 
ally means is that the smallest an- 
gle from vi to S ?2 should be in the 
same direction as the smallest an- 
gle from ei to 62 . 


Proof of Theorem 4.9.6. 


f /(T(x))|detr||<Tx| = lim^ £ Me ((/ ° T)| det T|) vol„(C) 
•'*” N ~'°° cei> w (ir>) 


-Jta. £ vol„(T(C)) 

”*°°C€X)N(* n ) v ^ 


4.9.19 


=|detT|vol n (C) 


= J im £ M P (f)voln(P) = f /(y)|d"y|. 0 

N->00 ' Jjgn 

Per (Djv(i«)) 


Signed volumes 

The fact that the absolute value of the determinant is the volume of the image 
of the unit cube allows us to define the notion of signed volume. 


Definition 4.9.8 (Signed volume). The signed fc-dimensional volume of 
the parallelepiped spanned by R* is the determinant 


det 


■ i 
i 





4.9.20 


Thus the determinant not only measures volume; it also attributes a sign to 
the volume. In H 2 , two vectors vj and v 2 , in that order , span a parallelogram of 
positive area if and only if the smallest angle from vi to \? 2 is counterclockwise, 
as shown in Figure 4.9.6. 

In R 3 , three vectors, vi, v 2 , and v 3 , in that order, span a parallelepiped of 
positive signed volume if and only if they form a right-handed coordinate sys- 
tem. Again, what we really mean is that the same hand that fits vj, v 2 , and v 3 
will fit e i,§ 2 , and €? 3 ; it is by convention that they are drawn counterclockwise, 
to accommodate the right hand. 


4.10 The Change of Variables Formula 

We discussed linear changes of variables in higher dimensions in Section 
4.9. This section is devoted to nonlinear changes of variables in higher dimen- 
sions. You will no doubt have run into changes of variables in one-dimensional 
integrals, perhaps under the name of the substitution method in methods of 
integration theory. 
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Example 4.10.1 (Change of variables in one dimension: substitution 
method). To compute 



sin xe 


COS X 


dx. 


4.10.1 


Traditionally, one says: set 

u — cos x , so that du = — sin x dx. 

Then when x = 0, we have u = cos 0 = 1, and when x = tt, we have u 
-1, so 


/* 


sin xe co&x dx 


= J —e u du = J e u du = e — A 


4.10.2 

COS 7T = 

4.10.3 


The meaning of expressions like 
du is explored in Chapter 6. We 
will see that we can use the change 
of variahles formula in higher di- 
mensions without requiring exact 
correspondence of domains, hut 
for this we will have to develop the 
language of forms. You will then 
find that this is what you were us- 
ing (more or less hlindly) in one 
dimension. 


In this section we want to generalize this sort of computation to several vari- 
ables. There are two parts to this: transforming the integrand, and transform- 
ing the domain of integration. In Example 4.10.1 we transformed the integrand 
by setting u = cost, so that du = -sin xdx (whatever du means), and we 
transformed the domain of integration by noting that x = 0 corresponds to 
u = cosO = 1 , and x = it corresponds tou = costt = -1. 

Both parts are harder in several variables, especially the second. In one 
dimension, the domain of integration is usually an interval, and it is not too 
hard to see how intervals correspond. Domains of integration in R n , even in the 
traditional cases of disks, sectors, balls, cylinders, etc., are quite a bit harder 
to handle. Much of our treatment will be concerned with making precise the 
“correspondences of domains” under change of variables. 

There is another difference between the way you probably learned the change 
of variables formula in one dimension, and the way we will present it now in 
higher dimensions. The way it is typically presented in one dimension makes the 
conceptual basis harder but the computations easier. In particular, you didn’t 
have to make the domains correspond exactly; it was enough if the endpoints 
matched. Now we will have to make sure our domains correspond precisely, 
which will complicate our computations. 

Three important changes of variables 

Before stating the change of variahles formula in general, we will first explore 
what it says for polar coordinates in the plane, and spherical and cylindrical 
coordinates in space. This will help you understand the general case. In ad- 
dition, many real systems (encountered for instance in physics courses) have a 
central symmetry in the plane or in space, or an axis of symmetry in space, 
and in all those cases, these particular changes of variables are the useful ones. 
Finally, a great many of the standard multiple integrals are computed using 
these changes of variables. 
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Polar coordinates 


In Equation 4.10.5, the r in 
r dr dB plays the role of | det T\ in 
the linear change of variables for- 
mula (Theorem 4.9.6): it corrects 
for the distortion induced by the 
polar coordinate map P . We could 
put j det T| in front of the integral 
in the linear formula because it is 
a constant. Here, we cannot put 
r in front of the integral: since P 
is nonlinear, the amount of distor- 
tion is not constant but depends 
on the point at which P is applied. 

In Equation 4.10.5, we could 
replace 

[ f (r cos 0 \ 

* IX F ( r >))- 

which is the format we used in 
Theorem 4.9.6 concerning the lin- 
ear case. 


Definition 4.10.2 (Polar coordinates map). The polar coordinate map 

P maps a point in the (r, 0)-plane to a point in the ( x , y)-plane. 

p. (z = rcos0) 4.10.4 

where r measures distance from the origin along the spokes, and the polar 
angle B measures the angle (in radians) formed by a spoke and the positive 

x axis. 


Thus, as shown in Figure 4.10.1, a rectangle in the domain of P becomes a 
curvilinear “rectangle” in the image of P . 



FIGURE 4.10.1. The polar coordinate map P maps the rectangle at left, with di- 
mensions A r and A B, to the curvilinear box at right, with two straight sides of length 
A r and two curved sides measuring rA0 (for different values of r). 


Proposition 4.10.3 (Change of variables for polar coordinates). 
Suppose / is an integr&ble function defined on K 2 , and suppose that the 
polar coordinate map P maps a region B C (0, oo) x [0, 2n) of the (r, 0)- 
plane to a region A in the (x, y)-plane. Then 

Jj{ X y ) 1**1 “ j B f{ r rZ 9 0)r\*«>\- 4 ' 10 ' 5 

Note that the mapping P : B — ► A is necessarily bijective (one to one and 
onto), since we required 0 € [0, 2tt). Moreover, to every A there corresponds 
such a B, except that 0 should not belong to A (since there is no well-defined 
polar angle at the origin). This restriction does not matter: the behavior of 
an integrable function on a set of volume 0 does not affect integrals (Theorem 
4.3.10). Requiring that B belong to [0, 2tt) is essentially arbitrary; the interval 
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This was originally computed by 
Archimedes, who invented a lot of 
the integral calculus in the pro- 
cess. No one understood what he 
was doing for about 2 000 years. 



Figure 4.10.2. 


In Example 4.10.4 we are mea- 
suring the region inside the cylin- 
der and outside the paraboloid. 


Figure 4.10.3. 

The lemniscate of equation 

r 2 = cos 26. 


[- 7 T, 7 r) would have done just as well. Moreover, there is no need to worry about 
what happens when 0 = 0 or 0 - 2n, since those are also sets of volume 0. 

We will postpone the discussion of where Equation 4.10.5 comes from, and 
proceed to some examples. 


Example 4.10.4 (Volume beneath a paraboloid of revolution). Con- 
sider the paraboloid of Figure 4.10.2, given by 

/ x \ (x 2 + y 2 if x 2 + y 2 < R 2 

z ~ f\y) ~ { 


o 


Usually one would write the integral 


if x 2 y 2 > R 2 • 


4.10.6 


lj( X y) l ^ 1 


as 


where 


/ (x 2 + y 2 )dxdy, 

Jd r 


DR = {{y ) €R2 \ x2 + y 2 ^ R2 } 


4.10.7 


4.10.8 


is the disk of radius R centered at the origin. 

This integral is fairly complicated to compute using Fubini’s theorem; Exer- 
cise 4.10.1 asks you to do this. Using the change of variables formula 4.10.5, it 
is straightforward: 




= f f (r 2 )(cos 2 0 + sin 2 0) r dr d0 
Jo Jo 

n R 

(r 2 ) 


4.10.9 


r dr d$ — 2ir 


r- 

T 


-I* 


Most often, polar coordinates are used when the domain of integration is a 
disk or a sector of a disk, but they are also useful in many cases where the 
equation of the boundary is well suited to polar coordinates, as in Example 
4.10.5. 

Example 4.10.5 (Area of a lemniscate). The lemniscate looks like a figure 
eight; the name comes from the Latin word for ribbon. We will compute the area 
of the right-hand lobe A of the lemniscate given by the equation r 2 = cos 20, 
i.e., the area bounded by the right loop of the figure eight shown in Figure 
4.10.3. (Exercise 4.10.2 asks you to write the equation of the lemniscate in 
complex notation.) 
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The formula for change of vari- 
ables for polar coordinates, Equa- 
tion 4.10.5, has a function / on 
both sides of the equation. Since 
we are computing area here, the 
function is simply 1. 



Figure 4.10.4. 

In spherical coordinates, a 
point is specified by its distance 
from the origin (r), its longtitude 
(0), and its longitude (<p); longi- 
tude and latitude are measured in 
radians, not in degrees. 

The r 2 cos <p corrects for distor- 
tion induced by the mapping. 


Of course this area can be written J A dxdy , which could be computed by 
Riemann sums, but the expressions you get applying Fubini’s theorem are dis- 
mayingly complicated. Using polar coordinates simplifies the computations. 

The region A (the right lobe) corresponds to the region B in the (r,0)-plane 
where 

B - {(*01 ~ J - ^ 0<r< \/cos 20 j . 4.10.10 

Thus in polar coordinates, the integral becomes 


/ w/4 / rVcc*29 \ r 

,.„U '*) "■/ 


Spherical coordinates 


tt/ 4 r r 2 1 vcoe 29 

-/. Mo * 

* /4 cos 20 


~ir/4 2 

sin 201 * /4 
4 


d$ 


ir/4 


1 

2' 


= r- A 


4.10.11 


Spherical coordinates are important whenever you have a center of symmetry 
in R 3 . 


Definition 4.10.6 (Spherical coordinates map). The spherical coordi- 
nate map S maps a point in space (e.g., a point inside the earth) known by 
its distance r from the center, its longitude 0, and its latitude <p. to a point 
in (x, y, 2r)-space: 

/ r \ / X — rCG60CQ6(p \ 

I 0 ] »-♦ j y ss rsin0cosy? 1 . 4.10.12 

\<p/ \ z = r sirup j 

This is illustrated by Figure 4.10.4. 


Proposition 4.10.7 (Change of variables for spherical coord inat es). 
Suppose f is an integrable function defined on ® 3 , and suppose that the 
spherical coordinate map S maps a region B of the (r, 0, <p)-space to a re- 
gion A in the (x, y, z)-apace. Farther, suppose that B c (0,oo) x [0, 2 tt) x 
(-tt/2, */2). Then 



f / rcos0cos^\ 

dxdydz * / fl rsmBoosip j r* cosy dr d$ dip. 

\ rsin<^ ) 


4.10.13 


Again, we will postpone the justification for this formula. 
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Example 4.10.8 (Spherical coordinates). Integrate the function z over 
the upper half of the unit ball: 


L 


z dxdydz, 


where A is the upper half of the unit ball, i.e., the region 


4.10.14 


For spherical coordinates, many 
authors use the angle from the 
North Pole rather than latitude. 
Mainly because most people are 
comfortable with the standard lat- 
itude, we prefer this form. The 
formulas using the North Pole are 
given in Exercise 4.10.10. 

As shown in Figure 4.10.4, r 
goes from 0 to 1, p from 0 to tt/2 
(from the Equator to the North 
Pole), and 0 from 0 to 2tt. 

At 0 = —tt/4 and 0 = tt/4, 
r = 0. 



A = 



R 3 


x 2 + y 2 + z 2 < 1, z> 0 


The region B corresponding to this region under S is 


4.10.15 



e (0,oo) x [0,27r) x (-7 t/2,7t/ 2) | r < 1, ip > 0 > . 


4.10.16 


Thus our integral becomes 


J^(r sin <p)(r 2 cos <p)drd$dtp = £ sin cos <pd0}dip\ 

- 2 tt f 1 ~dr = j. A 
Jo 2 4 


dr 


4.10.17 


Cylindrical coordinates 



Cylindrical coordinates are important whenever you have an axis of symme- 
try. They correspond to describing a point in space by its altitude (i.e., its 
2 -coordinate), and the polar coordinates r, 0 of the projection in the (x, y)- 
plane, as shown in Figure 4.10.5. 


Figure 4.10.5. 

In cylindrical coordinates, a 
point is specified by its distance 
r from the 2-axis, the polar angle 
0 shown above, and the z coordi- 
nate. 


Definition 4.10*9 (Cylindrical coordinates map). The cylindrical co- 
ordinates map C maps a point in space known by its altitude z and by and 
the polar coordinates r,0 Of the projection in the (x, y)-plaae, to a point in 
(x,y, 2 )-space: 



4.10.18 
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In Equation 4.10.19, the r in 
rdrd&dz corrects for distortion 
induced by the cylindrical coordi- 
nate map C. 

Exercise 4.10.3 asks you to de- 
rive the change of variable for- 
mula for cylindrical coordinates 
from the polar formula and Fu- 
bini’s theorem. 


Proposition 4.10.10 (Change of variables for cylindrical coordi- 
nates). Suppose f is an integrable function defined on K 3 , and suppose that 
the cylindrical coordinate map C maps a region B C (0, oo) x [0, 2?r) xl of 
the (r, 0, z)-space to a region A in the (x,y,z) -space. Then 



4.10.19 



Example 4.10.11 (Integrating a function over a cone). Let us integrate 
( x 2 + y 2 )z over the region A C R 3 that is the part of the inverted cone z 2 > 
x 2 + y 2 where 0 < z < 1 , as shown in Figure 4.10.6. This corresponds under C 
to the region B where r < z < 1 . Thus our integral becomes 


Figure 4.10.6. 

The region we are integrating 
over is bounded by this cone, with 
a flat top on it. 



=1 


+ y 2 )zdxdydz = J r 2 z(cos 2 $ + sin 2 9) r dr d$ dz = J (r 2 z) 

- 2 'G-sH 


rdrdBdz 


4.10.20 


Since the integrand r 3 z doesn't 
depend on 9 , the integral with re- 
spect to 9 just multiplies the re- 
sult by 2 tt, which we did at the 
end of the second line of Equation 
4.10.20. 


Note that it would have been unpleasant to express the flat top of the cone in 
spherical coordinates. A 

General change of variables formula 

Now let’s consider the general change of variables formula. 


“Injective” and “one to one” 
are synonyms. 

We denote by u a point in U 
and by v a point in V. 


Theorem 4.10.12 (General change of variables formula). Let X be 
a compact subset of R n , with boundary of volume 0, and U an open neigh- 
borhood of X. Let $ : U — > M n be a C l mapping with Lipschitz derivative, 
that is injective on (X — dX), and such that [D$(x)1 is invertible at every 
x€(X- dX). Set Y = $(X). 

Then iff :Y->R is integrable , (/ o $) |det[D#]| is integrable on X , and 
J /(v) l^vl = f (/ o *)(u) |det[D$(u)]| |<f*u|. 4.10.21 

* r J X 
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Once we have introduced im- 
proper integrals, in Section 4.11, 
we will be able to give a cleaner 
version (Theorem 4.11.16) of the 
change of variables theorem. 


Let us see how our examples are special cases of this formula. Let us consider 
polar coordinates 


and let / : IR 2 — ♦ IR be an integrable function. Suppose that the support of / is 
contained in the disk of radius R. Then set 

X = {(g) | 0 < r < P, 0< 0 < 2tt|. 4.10.23 

and take U to be any bounded neighborhood of A, for instance the disk centered 
at the origin in the (r, 0)-plane of radius R + 2?r. We claim all the requirements 
are satisfied: here P, which plays the role of is of class C 1 in U with Lipschitz 
derivative, and it is injective (one to one) on X-dX (but not on the boundary). 
Moreover, [DPJ is invertible in X - dX, since det[D$] = r which is only zero 
on the boundary of X. 

The case of spherical coordinates 


/ r \ / r cos y? cos 0 \ 

S : J 0 ) = I r cos sin 0 J 4.10.24 

\<P ) \ r sin <p J 

is very similar. If as before the function / to be integrated has its support in 
the ball of radius R around the origin, take 

J | 0<r<fl, <v< |,0<fl<2»|, 4.10.25 

and U any bounded open neighborhood of X. Then indeed S is C' on U with 
Lipschitz derivative; it is injective on X — dX , and its derivative is invertible 
there, since the determinant of the derivative is r 2 cosv?, which only vanishes 
on the boundary. 



Remark 4.10.13. The requirement that $ be injective (one to one) of- 
ten creates great difficulties. In first year calculus, you didn’t have to worry 
about the mapping being injective. This was because the integrand dx of one- 
dimensional calculus is actually a form field, integrated over an oriented domain - 
f!fd* = -f b a fdx. 

For instance, consider dx. If we set x = u 2 , so that dx = 2 udu, then 

x = 4 corresponds to u = ±2, while x = 1 corresponds to u = ±1. If we choose 

u ~ f° r ^ ie first and u = 1 for the second, then the change of variable 
formula gives 

dX = h 2udU ^ l^ 2 !^ 2 = 4 ~ 1 = 3, 4.10.26 
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Figure 4.10.7. 


The paving P{V n N ™) of R 2 cor- 
responding to polar coordinates; 
the dimension of each block in the 
angular direction (the direction of 
the spokes) is tt/2 n . 


even though the change of variables was not injective. We will discuss forms in 
Chapter 6. The best statement of the change of variables formula makes use of 
forms, but it is beyond the scope of this book. A 

Theorem 4.10.12 is proved in Appendix A. 16; below we give an argument 
that is reasonably convincing without being rigorous. 


A heuristic derivation of the change of variables formulas 


It is not hard to see why the change of variables formulas above are correct, 
and even the general formula. For each of the coordinate systems above, the 
standard paving Vn in the new space induces a paving in the original space. 

Actually, when using polar, spherical, or cylindrical coordinates, you will 
be better off if you use paving blocks with side length in the angular 

directions, rather than the 1/2^ of standard dyadic cubes. (Since n is irrational, 
dyadic fractions of radians do not fill up the circle exactly, but dyadic pieces of 
turns do.) We will call this paving V T ff xu , partly to specify these dimensions, 
but mainly to remember what space is being paved. 

The paving of IR 2 corresponding to polar coordinates is shown in Figure 
4.10.7; the paving of R 3 corresponding to spherical coordinates is shown in 
Figure 4.10.8. 

In the case of polar, spherical, and cylindrical coordinates, the paving 
clearly forms a nested partition. (When we make more general changes of 
variables $, we will need to impose requirements that will make this true.) 
Thus given a change of variables mapping $ with respect to the paving T>™ w 
we have 


L 


= lim T, M» (c) (/)vol„$(C) 

/V — * OO 

C-gpneu. 

= lim £ M c (/o$) V01 " * (C) 

N— *oo _ *—* 


4.10.27 


C€PJ7“' 


vol n C 


vol n C. 


This looks like the integral over U of the product of / o $ and the limit of the 
ratio 


vol n $(C) 
vol n C 

as N — ► oo, so that C becomes small. This would give 


4.10.28 


f /Kv|~ / f(/o$) lim vol '*$(^) '\ | d n u | 4.10.29 

Jv Ju \ N -°° vol n C ) 

This isn’t meaningful because the product of / o $ and the ratio of Equation 
4.10.28 isn’t a function, so it can’t be integrated. 

But recall (Equation 4.9.1) that the determinant is precisely designed to 
measure ratios of volumes under linear transformations. Of course our change 
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of variable map $ isn’t linear, but if it is differentiable, then it is almost linear 
on small cubes, so we would expect 


~ |det|D * (u)]| 


4.10.30 


when C € 2?Rr eu, (R n ) is small (i.e., N is large), and u € C. So we might expect 
our integral f A f to be equal to 


[ /krv|« [ (/ o*)(x)n<ri»|. 
Jv Ju 


4.10.31 


We find the above argument completely convincing; however, it is not a 
proof. Turning it into proof is an unpleasant but basically straightforward 
exercise, found in Appendix A. 16. 

Example 4.10.14 (Ratio of areas for polar coordinates). Consider the 
ratio of Equation 4.10.30 in the case of polar coordinates, when $ = P, the 
polar coordinates map. If a rectangle C in the (r, 0) plane, containing the point 

( ) , has sides of length A r and A0, then the corresponding piece P{C) of 

the (x, y) plane is approximately a rectangle with sides r o A0, Ar. Thus its area 
is approximately r o ArA0, and the ratio of areas is approximately r 0 . Thus we 
would expect that 


f /|<Tv|= f (f oP)r dr <ffl } 
Jv Ju 


4.10.32 


Figure 4.10.8. 

Under the spherical coordinate 
map S, a box with dimensions 
Ar, Ad, and A <p, and anchored at 

M 

I 0 I (top) is mapped to a curvi- 
linear ‘‘box’’ with dimensions 


Ar, r cos y?A0, and r A<p. 


where the r on the right is the ratio of the volumes of infinitesimal paving 
blocks. 

Indeed, for polar coordinates we find 

( DP (e)] = [22 > ■ oth * | det [ Di> («)]| =r - 410 33 

explaining the r in the change of variables formula, Equation 4.10.5. A 

Example 4.10.15 (Ratio of volumes for spherical coordinates). In 
the case of spherical coordinates, where $ = 5, the image S(C) of a box 
C € ^^(R 3 ) with sides Ar,A0,A(p is approximately a box with sides Ar, 
rA(p, and r cos ^A0, so the ratio of the volumes is approximately r 2 cos <p. 

Indeed, for spherical coordinates, we have 



cos 0 cos (p — r sin# cos v? 
sin 0 cos (p rco80cos<p 

sin v? 0 


-r cos 0 sin ^ 
*rsin0siny? 
rcos(p 


4.10.34 
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so that 



= r cos <p. 


4.10.35 


Example 4.10.16 (A less standard change of variables). The region T 


(ih)‘ * (rb)’ s '• - ls ' sl 


4.10.36 


looks like the curvy-sided tetrahedron pictured in Figure 4.10.9. We will com- 
pute its volume. The map 7 : [0, 2 ir] x [0, 1] x [-1, 1] -4 IR 3 given by 


0 \ / t(l - z)cos0 

7 | t j = ^ t(l + z)sin0 


Figure 4.10.9. 

The region T resembles a cylin- 
der flattened at the ends. Horizon- 
tal sections of T are ellipses, which 

degenerate to lines when z = ±1. parametrizes T. The determinant of [D7] is 

*-t(l-z)sin0 (l-z)cos0 -tcosO 
det t(l + z)cos0 (l + z)sin0 tsin0 
0 0 1 

Thus the volume is given by the integral 

Exercise 4.5.18 asks you to - 2 * *1 *1 4 ^ 

solve a problem of the same sort. J J J ” z *)\ d*dtd0 = — . A 


4.10.37 


= -t(l - z 2 ). 


4.10.38 


4.10.39 


4.11 Improper Integrals 


There are many reasons to 
study improper integrals. An es- 
sential one is the Fourier trans- 
form, the fundamental tool of en- 
gineering and signal processing 
(not to mention harmonic analy- 
sis). Improper integrals are also 
ubiquitous in probability theory. 


So far all our work has involved the integrals of bounded functions with bounded 
support. In this section we will relax both of these conditions, studying improper 
integrals: integrals of functions that are not bounded or do not have bounded 
support, or both. 


Improper integrals in one dimension 

In one variable, you probably already encountered improper integrals: integrals 
like 


/ °° 1 

j-^2 ‘k = l arCtani l-oo = 


7T 


■00 

.00 


e~ x dx = n! 


fOO 

L *' 
f 0 dx = = 2 ’ 


4.11.1 

4.11.2 


4.11.3 
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In the cases above, even though the domain is unbounded, or the function is 
unbounded (or both), the function can be integrated, although you have to work 
a bit to define the integral: upper and lower sums do not exist. For the first two 
examples above, one can imagine writing upper and lower sums with respect to 
a dyadic partition; instead of being finite, these sums are infinite series whose 
convergence needs to be checked. For the third example, any upper sum will 
be infinite, since the maximum of the function over the cube containing 0 is 
infinity. 

We will see below how to define such integrals, and will see that there are 
analogous multiple integrals, like 


L 




1 + |x| n+1 

There are other improper integrals, like 


4.11.4 


r 


sin a: 


dx. 


to * 

which are much more troublesome. You can define this integral as 

rA 


4.11.5 


lim f ?!H dx 

A-*o o J Q X 


4.11.6 


and show that the limit exists, for instance, by saying that 


tut OCI ito 


00 Ak+l)x 


/»i 

£/„ 

k=0 JklT 


smx 


dx 


4.11.7 


is ari decreasing alternating series whose terms go to 0 as k -> oo. But this 
works only because positive and negative terms cancel: the area between the 
graph of sin x/x and the x axis is infinite, and the limit 


lim 

A—f ►ex? 


r 

Jo 


smx 

x 


dx 


4.11.8 


does not exist. Improper integrals like this, whose existence depends on can- 
cellations, do not generalize at all well to the framework of multiple integrals. 
In particular, no version of Fubini’s theorem or the change of variables formula 
is true for such integrals, and we will carefully avoid them. 


Defining improper integrals 

It is harder to define improper integrals— integrals of functions that are un- 
bounded, or have unbounded support, or both— than to define “proper” inte- 
grals. It is not enough to come up with a coherent definition: without Fubini’s 
theorem and the change of variables formula, integrals aren’t of much inter- 
est, so we need a definition for which these theorems are true, in appropriately 
modified form. 
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Using I = Eu {+ 00 ,- 00 } 
rather than M is purely a matter 
of convenience: it avoids speaking 
of functions defined except on a set 
of volume 0. Allowing infinite val- 
ues does not affect our results in 
any substantial way; if a function 
were ever going to be infinite on 
a set that didn’t have volume 0, 
none of our theorems would apply 
in any case. 


We will proceed in two steps: first we will define improper integrals of non- 
negative functions; then we will deal with the general case. Our basic approach 
will be to cut off a function so that it is bounded with bounded support, in- 
tegrate the truncated function, and then let the cut-off go to infinity, and see 
what happens in the limit. 

Let f : W 1 — > R U { 00 } be a function satisfying /(x) > 0 everywhere. We 
allow the value f 00 , because we want to integrate functions like 



4.11.9 


setting this function equal to +00 at the origin avoids having to say that the 
function is undefined at the origin. We will denote by IR the real numbers 
extended to include +00 and — 00 : 


R = JR U {+ 00 , - 00 }. 


4.11.10 


In order to define the improper integral, or I-integml, of a function / that is 
not bounded with bounded support, we will use truncated versions of /, which 
are bounded with bounded support, as shown in Figure 4.11.1. 


For example, if we truncate by 
R = 1, then 

f/(x)if |x|<l, /(x) < 1 
(/)i( x )=\l if |x| <l,/(x) >1 

lo if |x| > 1. 


Definition 4.11.1 (/?- truncation). The /^truncation [f] R is given the for- 
mula 


[/]*(*) = 



if |x| < R and /(x) < /?; 
if jxj < R and /(x) > R ; 
if |x| > R, 


4.11.11 


Note that if R x < R 2 , then [/]/j l < [/]j* 3 . In particular, if all [/]« are 
integrable, then 


We will use the term 1-integral 
to mean “improper integral,” and 
I-integrable to mean “improperly 
integrable.” 

In Equation 4.11.13 we could 
write lim i R_ 00 rather than sup A . 
The condition for I-integrability 
says that the integral in Equa- 
tion 4.11.13 must be finite, for any 
choice of R. 


f < f [/jflj(x)|d"x|. 4.11.12 

Definition 4.11.2 (Improper integral). If the function / : R n I is 
non-negative (i.e., satisfies f(x) > 0), it is improperly integrable if all [f] R 
are integrable, and 

sup / [/]*(x)|d"x| < 00 . 4.11.13 

The supremum is then called the improper integral, or I-integral, of /. 
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If / has both positive and negative values, write / = / + - / > where both 
/+ and f~ are non-negative (Definition 4.3.4). Then / is I-integrable if and 
only if both /+ and f~ are I-integrable, and 



[ /(x)|<Tx| = / / + (x)|rf”x|- / /-(x)|<Tx|. 4.11.14 

IB" J J * n 

Note that since / + and f~ are both I-integrable, the I-integrability of / does 
not depend on positive and negative terms canceling each other. 

A function that is not I-integrable may qualify for a weaker form of integra- 
bility, local integrability: 


-R 


R 


Figure 4.11.1. 

Graph of a function /, trun- 
cated at R to form [/]«; unlike /, 
the function (/]« is bounded with 
bounded support. 


Definition 4.11.3 (Local integrability). A function / : R n — * R is locally 
integrable if all the functions [/]/? are integrable. 

For example, the function 1 is locally integrable but not I-integrable. 

Of course a function that is I-integrable is also locally integrable, but im- 
proper integmbility and local integrability address two very different concerns. 
Local integrability, as its name suggests, concerns local behavior; the only way 
a bounded function with bounded support, like [/]/?, can fail to be integrable 
is if it has “local nonsense,” like the function which is 1 on the rationals and 
0 on the irrationals. This is usually not the question of interest when we are 
discussing improper integrals; there the real issue is how the function grows at 
infinity: knowing whether the integral is finite. 


Generalities about improper integrals 


Remember that 1R denotes the 
real numbers extended to include 
+oo and -oo. 


Proposition 4.11.4 (Linearity of improper integrals). If/, g : R n R 
are I-integrable, and o, b € R, then af + bg k I-integrable, and 

f («/(x) + f>s(x))|<i"x| = o / /(x)|<Tx| + b f s(x)|<Tx|. 4.11.15 

y»* y*« y»n 

Proof. It is enough to prove the result when / and g are non-negative. In 
that case, the proposition follows from the computation: 

o f /(x)|cTx| + b [ tfMl^xl 

y i n y*»» 

— u sup f [/JfiMl^xl+bsup f fo]*(x)|«r , x| 

R y*« R JUn 

, 4.11.16 

= sup (a[/) fl (x) + %] fi (x)J |<Tx| 

= f (af + bg)(x)\<Px\. □ 

JRn 
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Proposition 4.11.5 (Criterion for improper integrals). A function 
f : R n -+ M is I-integrable if and only if it is locally integrable and |/| is 
I-integr&ble. 


Proposition 4.11.5 gives a crite- 
rion for integrability. 


Proof. If / is locally integrable, so are / + and / , and since 

f [/]+(x)|d"x| < / [|/|]r(x)M"x| < / |/(x)| |rf"x| 4.11.17 

JR” JR n Ja n 

is bounded, we see that / + (and analogously /“) are both I-integrable. Con- 
versely, if / is 1-integrable, then |/| = / + + /“ is also. □ 


Volume of unbounded sets 

In Section 4.1 we defined the n-dimensional volume of a bounded subset A C 
Now we can define the volume of any subset. 

Definition 4.11.6 (Volume of a subset ofl n ). The volume of any subset 
A C R n is 

vol n A= [ 1^x1=/ Xa(x) l^xl = sup f [X/iUM |<Tx|. 

J A JtL n R JR n 


When we spoke of the volume 
of graphs in Section 4.3, the best 
we could do (Corollary 4.3.6) was 
to say that any bounded part of the 
graph of an integrable function has 
volume 0. Now we can drop that, 
annoying qualification. 

A curve has length but no area 
(its two-dimensional volume is 0). 
A plane has area, but its three- 
dimensional volume is 0 . . . . 


Thus a subset A has volume 0 if its characteristic function Xa is I-integrable, 
with I-integral 0. 

With this definition, several earlier statements where we had to insert “any 
bounded part of” become true without that restriction: 

Proposition 4.11.7 (Manifold has volume 0). (a) Any closed manifold 
M € R n of dimension less than n has n-dimensional volume 0. 

(b) Jn particular, any subspace E C R n with dim E < n has n-dimensional 
volume 0. 


Corollary 4.11.8 (Graph has volume 0). If f : R n — ♦ IR is an integrable 
function , then its graph T(f) C R n+1 has (n + 1) -dimensional volume 0. 


Integrals and limits 

The presence of sup in Definition 4.11.2 tells us that we are going to need to 
know something about how integrals of limits of functions behave if we are 
going to prove anything about improper integrals. 

What we would like to be able to say is that if /* is a convergent sequence 
of functions, then, as k oo, the integral of the limit of the /* is the same 
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The key condition is that given 
c, the same N works for all x. 


Exercise 4.11.1 asks you to ver- 
ify these statements. 

Instead of writing “the se- 
quence PkXA” we could write “p* 
restricted to A .” We use p k XA be- 
cause we will use such restrictions 
in integration, and we use X to de- 
fine the integral over a subset A: 

I P(x) = f p{x)Xa(x) 

Ja Jr* 

(see Equation 4.1.5 concerning the 
coastline of Britain). 


Equation 4.11.20: if you pic- 
ture this as Riemann sums in one 
variable, e is the difference be- 
tween the height of the lower rect- 
angles for /, and the height of the 
lower rectangles for /*, while the 
total width of all the rectangles is 
vol n (BR), since Br is the support 
for /*. 

The behavior of integrals under 
limits is a big topic; the main rai- 
son d’etre for the Lebesgue inte- 
gral is that it is better behaved 
under limits than the Riemann in- 
tegral. We will not introduce the 
Lebesgue integral in this book, but 
in this subsection we will give the 
strongest statement that is possi- 
ble using the Riemann integral. 


as the limit of the integral of f k . There is one setting where this is true and 
straightforward: uniformly convergent sequences of integrable functions, all 
with support in the same bounded set. 

Definition 4.11.9 (Uniform convergence). A sequence of functions f k : 
M fc — ♦ R converges uniformly to a function / if for every e > 0, there exists 
K such that when k> K, then |A(x) - /(x)| < e. 

The three sequences of functions in Example 4.11.11 below provide typical ex- 
amples of non-uniform convergence. Uniform convergence on all of R n isn’t a 
very common phenomenon, unless something is done to cut down the domain. 
For instance, suppose that 

Pk(x) = a 0 .k + Qi.fcX + • • • + a m>k x m 4.11.18 

is a sequence of polynomials all of degree < m, and that this sequence “con- 
verges” in the “obvious” sense that for each degree i, the sequence of coefficients 
at,o,aa> a i. 2 > • • • converges. Then p k does not converge uniformly on R. But 
for any bounded set A , the sequence p k XA does converge uniformly. 

Theorem 4.11.10 (When the limit of an integral equals the integral 
of the limit). If f k is a sequence of bounded integrable functions, all with 
support in a fixed ball Br C R n , and converging uniformly to a function f, 
then / is integrable, and 

lira f / fc (x)|d"x|= f /(x) |d n x|. 4.11.19 

Ar—oo y K n J R n 

Proof. Choose e > 0 and K so large that sup x€En |/(x) - f k (x)\ < e when 
k > K. Then 

Ln(J) > Lu(f k ) - evol tx(Br) and U N (f) < U N (f k ) + e\o\ n (B R ) 4.11.20 
when k> K. Now choose N so large that U^(f k ) - Lv(f k ) < e; we get 

Ujv(f) - Ljsj(f) < Usifk) - Lv(A)+2evol„(B/?), 4.11.21 

s "v< * 

<t 

yielding U(f) - L(f) < e(l + 2vol„(B/?)). Since e is arbitrary, this gives the 
result. □ 

In many cases Theorem 4.11.10 is good enough, but it cannot deal with 
unbounded functions, or functions with unbounded support. Example 4.11.11 
shows some of the things that can go wrong. 

Example 4.11.11 (Cases where the mass of an integral gets lost). Here 
are three sequences of functions where the limit of the integral is not the integral 
of the limit. 
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The dominated convergence 
theorem avoids the problem of 
non-integrability illustrated by 
Equation 4.11.26, by making the 
local integrability of / part of the 
hypothesis. 

The dominated convergence 
theorem is one of the fundamen- 
tal results of Lebesgue integra- 
tion theory. The difference be- 
tween our presentation, which uses 
Riemann integration, and the Le- 
besgue version is that we have to 
assume that / is locally integrable, 
whereas this is part of the conclu- 
sion in the Lebesgue theory. It is 
hard to overstate the importance 
of this difference. 

The “dominated” in the title 
refers to the |/*| being dominated 
by 9 - 


4.11.22 


(1) When f k is defined by 

f 1 if k <x < k+1 4 ii oo 

f k (x) = < 41122 

l 0 otherwise, 

the mass of the integral is contained in a square 1 high and 1 wide; as k -* oo 
this mass drifts off to infinity and gets lost: 


-{ 


/•oo roc y»oo 

lim / f k (x)dx = 1. but / lim f k (x)dx = / 0dx = 0. 4.11. 

fc-oo /„ Jo Jo 


(2) For the function 


fk(x) 


-{ 


k if 0 < i < j 


4.11.24 


( 0 otherwise, 

the mass is contained in a rectangle k high and 1 / k wide; a s k —* oo, the height 
of the box approaches oo and its width approaches 0: 


lim f f k (x)dx = l,but f lim /*(x) dx — f 0dx = 0. 
k— oo Jq Jo k— oo Jo 


4.11.25 


Then 


4.11.26 


4.11.27 


(3) The third example is less serious, but still a nasty irritant. Let ns make 
a list ai,a 2 , • • • of the rational numbers between 0 and 1. Now define 

/*<*) = / 1 ifl€{ai “ } 4.11.26 

' \ 0 otherwise. 

Then f fk(x)dx = 0 for all A:, 4.11.27 

Jo 

but lim*— oo fk is the function which is 1 on the rationals and 0 on the irrationals 
between 0 and 1, and hence not integrable. A 

Our treatment of integrals and limits will be based on the dominated con- 
vergence theorem , which avoids the pitfalls of disappearing mass. This theorem 
is the strongest statement that can be made concerning integrals and limits if 
one is restricted to the Riemann integral. 

Theorem 4.11.12 (Dominated convergence theorem for Riemann 
integrals). Let fk : R n -♦ R be a sequence of I-integrable functions, let 
f : ]R n -* E be a locally integrable function , and let g : R n -* M be I- 
integr&ble. Suppose that all fk satisfy |/*| < g, and 

lim /*(x) = /(x) 

fc— .oo 

except perhaps for x in a set B of volume 0. Then 

lim f f k (x) |<Tx| = f lim f k (x) |<Tx| = f fix) \<Tx\. 4.11.28 

fc^OO J*n fc— oo y B n 
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The crucial condition above is that all the |/*| are bounded by the I-integrable 
function g\ this prevents the mass of the integral of the fk from escaping to 
infinity, as in the first two functions of Example 4.11.22. The requirement that 
/ be locally integrable prevents the kind of “local nonsense” we saw in the third 
function of Example 4.11.22. 

The proof, in Appendix A. 18, is quite difficult and very tricky. 

Before rolling out the consequences, let us state another result, which is often 
easier to use. 


Saying fi < / 2 means that for 
any x, /i(x) < / 2 (x). 

The conclusions of the dom- 
inated convergence theorem and 
the monotone convergence theo- 
rem are not identical; the integrals 
in Equation 4. 1 1 .28 are finite while 
those in Equation 4.11.29 may be 
infinite. 


Theorem 4.11.13 (Monotone convergence theorem). Let /* : M n -♦ R 
be a sequence of I-integrable functions, and f : R n —> R be a locally integrable 
function, such that 

0 < fi < /a < . . • f sup /*(*) “ /( x ) 4.11.29 

k—*oo 

except perhaps for x in a set B of volume 0. Then 

sup f /*(x) |<rx| = f /(x) 4.11.30 

it Jm n J* n v -v-' 

#UP K /fc(x) 

in the sense that they are either both infinite, or they are both Unite and 
equal. 


Note that the requirement in the dominated convergence theorem that (/*( < 
g is replaced in the monotone convergence theorem by the requirement that the 
fk be monotone increasing; 0 < /i < /2 < — 


Proof. By the dominated convergence theorem, 


sup f (Mfi(x)Kx| 
k Jmn 



[/Mx)|d"x|, 


4.11.31 


Unlike limits, sups can always 
be exchanged, so in Equation 
4.11.32 we can rewrite sup A sup fc 
as supfcSupft. 

Equation 4.11.33 is the defini- 
tion of I-integrabiiity, applied to /; 
Equation 4.11.34 is the same defi- 
nition, applied to kk- 


since all the [/*]/? are bounded by [/)/?, which is I-integrable (i.e., [/]/? plays the 
role of g in the dominated convergence theorem). Taking the sup as /? — ► oo of 
both sides gives 

sup f [f] R (x)\<rx\ = supsup f lf k ] R (x)\crx\ = supsup [ [A]*(x)|d n x|, 

R J I" R k Jm n k R Jl in 


4.11.32 

and either both sides are infinite, or they are both finite and equal. But 

sup f (/]/?(x)|d n x| = f f(x)\(Tx\ 4.11.33 

r J R n Jm n 

and sup sup / [f k ] R (x) \(Tx\ = sup [ fkWlcFxl □ 4.11.34 

k r J i« k Jmn 
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Fubini’s theorem and improper integrals 


We will now show that if you state it carefully, Fubini’s theorem is true for 
improper integrals. 


There is one function 

y »-* 

for each value of x. 

Here, x represents n entries of 
a point in M 7i+m , and y represents 
the remaining m entries. 


Theorem 4.11.14 (Fubini’s theorem for improper integrals). Let 
f : R n x R m — ► M be a function such that 

(1) The functions y »-> [/]/i(x,y) are integrate; 

(2) The function h(x) = f Mm /(x, y) \<Ty\ is locally integrable as a func- 
tion of x; 

(3) The function f is locally integrable. 

Then f is I-integrable if and only if h is I-integrable, and if both are 
I-integrable, then 

[ /(x,y)|d"x||<f n y| = f (f /(x,y)|<f*y|) |<Tx|. 4.11.35 

JM n xl m Ja n VI m / 


Proof. To lighten notation, let us denote by hR the function 

h R (x) = f [/]rt(x,y)|d m y|; note that lim h R (x) = /i(x). 4.11.36 

J E"* R—oo 

Applying Pubini’s theorem (Theorem 4.5.8) gives 



xB m 


[/U(x,y) |<Tx| l^yl 


L CL [/lR(x,y)l<ry| ) i <rx i 

[ h R (x) |(Tx|. 

J i" 


4.11.37 


Taking the sup of both sides as R — ♦ oo and (for the second equality) applying 
the monotone convergence theorem to h R , which we can do because h is locally 
integrable and the h R are increasing as R increases, gives 


sup [ {/] k(x, y) |<Tx| |<f"y| 

R J I n xl m 


= sup f h R (x) \d n x\ 
= f sup/i*(x)|d n x|. 


4.11.38 


Thus we have 
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The terms connected by 1 are 
equal. On the left-hand side of 
Equation 4.11.39, the first line is 
simply the definition of the im- 
proper integral of / on the third 
line. On the right-hand side, to 
go from the first to the second line 
we use the monotone convergence 
theorem, applied to [/)/?. 


hrt(x) 


sup / [/]*(*, y)|d"x||«ry| 

1 


= f sup / [/]/?(x,y)|rf m y| |cTx| 

J i" J i m 

i 

= J (jf sup[/]«(x,y)|rf m y|j |(Tx| 


l 


f /(x,y)|d"x||<ry| 

J H n xR m 




4.11.39 


It’s not immediately apparent 
that 1/(1 4 - x 2 4 - y 2 ) is not inte- 
grate; it looks very similar to the 
function in one variable, 1/(1 -fx 2 ) 
of Equation 4.11.1, which is inte- 
grate. 


Example 4.11.15 (Using Fubini to discover that a function is not I- 
integrable). Let us try to compute 

41,40 

According to Theorem 4.11.14, this integral will be finite (i.e., the function 
i+J+ y * is I-integrable) if 


/. (/. r 


+ x z + y d 


') 


\dy \ ) \dx\ 


4.11.41 


is finite, and in that case they are equal. In this case the function 


h(x) = 


-/.T 


1 


+ X 2 + y 2 


\dy\ 


can be computed by setting y 2 — (1 4- x 2 )u 2 , leading to 


4.11.42 


“'’-/.TTFTP 


VT+x 2 


;(i+x 2 )(i-bu 2 ) 


|du| = 


7T 


\/l + X 2 


4.11.43 


But h(x) is not integrable, since 1/x/H- x 1 > \/2x when x > 1, and 


/ 


, h dx= b ogA ' 


4.11.44 


which tends to infinity as A — ► oo. A 
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Note how much cleaner this 
statement is than our previous 
change of variables theorem, The- 
orem 4.10.12. In particular, it 
makes no reference to any particu- 
lar behavior of $ on the boundary 
of U. This will be a key to set- 
ting up surface integrals and simi- 
lar things in Chapters 5 and 6. 

Recall that a C l mapping is 
once continuously differentiable: 
its first derivatives exist and are 
continuous. A diffeomorphvtm is a 
differentiable mapping $ : U — * V 
that is bijective (one to one and 
onto), and such that $ -1 : V — * U 
is also differentiable. 


__ Recall (Definition 1.5.17) that 
C is the closure of C : the subset of 
R n made up of the set of all limits 
of sequences in C which converge 
in R n . 


The change of variables formula for improper integrals 

Theorem 4.11.16 (Change of variables for improper integrals). Let 

U and V be open subsets of M n whose boundaries have volume 0, and $ : 
U —► V a C 1 diffeomorphism T with locally Lipschitz derivative . Iff : V -* 1R 
is an integrable function , then (f o(p) | det[Z?$]| is also integrable, and 

f /(v)|<f”v| ss f (/ o $)(u)|det(D$(u|)] M”u|. 4.11.45 

Jv Ju 


Proof. As usual, by considering f — / + - /“, it is enough to prove the result 
if / is non-negative. Choose R > 0, and let Ur be the points x € U such that 
|x| < R. Choose N so that 

£ <L 4.11.46 

C€2V(fln, 

'&ndUR*<t> 


which is possible since the boundary of U has volume 0. Set 

X R = (J C. 4.11.47 

ce £«»*), 
ecu 


and finally Yr ~ $(X R ). 

Note that Xr is compact, and has boundary of volume 0, since it is a union 
of finitely many cubes. The set Yr is also compact, and its boundary also has 
volume 0. Moreover, if / is an I-integrable function on V, then in particular 
\}]r is integrable on Yr. Thus Theorem 4.10.12 applies, and gives 


/ l/)ft°$(x)|det[D$(x)]||<f*x| = f [/]«(y)|<Ty|. 
J Xr Jy R 


4.11.48 


Now take the supremum of both sides as R — + oo. By the monotone convergence 
theorem (Theorem 4.11.13), the left side and right side converge respectively to 


f f o $(x)|det[D$(x)]||<f"x| and f /(y)|<Ty|, 
Ju Jv 


4.11.49 


in the sense that they are either both infinite, or both finite and equal. Since 
Jv /(y)M"y| < 00 1 they are finite and equal. □ 


The Gaussian integral 

The integral of the Gaussian bell curve is one of the most important integrals 
in all of mathematics. The central limit theorem (see Section 4.6) asserts that 
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The polar coordinates 
(Equation 4.10.4): 

p ■ ( r \ ^ ( rcosdA 

W \ r sin & ) 

Here, 

x 2 +y 2 = r 2 (cos 2 6 + sin 2 0) 


if you repeat the same experiment over and over, independently each time, and 
make some measurement each time, then the probability that the average of 
the measurements will lie in an interval (a, 6] is 

—— p 2^- dx 4.11.50 

\/27r<7 

where x is the expected value of x, and a represents the standard deviation. 
Since most of probability is concerned with repeating experiments, the Gaussian 
integral is of the greatest importance. 



Example 4.11.17 (Gaussian integral). An integral of immense importance, 
which underlies all of probability theory, is 



4.11.51 


5 

But the function e~ x doesn’t have an anti-derivative that can be computed in 
elementary terms. 13 

One way to compute the integral is to use improper integrals in two dimen- 
sions. Indeed, let us set 


/ oc 

•oo 


e~ x dx — A. 


4.11.52 


Then 




e -(x 2 +y 2 ) (jl 


M x|. 


4.11.53 


Note that we have used Fubini, and we now use the change of variables formula, 
passing to polar coordinates: 


map 


J m} e-f^+v 3 ) |d 2 x| = j e- r \drd0. 


4.11.54 


The factor of r which comes from the change of variables makes this straight- 
forward to evaluate: 


= r 2 . 


n oo 

re - 5 

. 


dr d$ = 2n 


r 


2 1 oo 


= 7 T. 


4.11.55 


So A — ypii. A 


12 

This is a fairly difficult result; see Integration in Finite Terms by R. Ritt, 
Columbia University Press, New York, 1948. Of course, it depends on your defi- 
nition of elementary; the anti-derivative f^e^dt is a tabulated function, called 
the error function. 
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When does the integral of a derivative equal the derivative of an 
integral? 

Very often we will need to differentiate a function which is itself an integral. 
This is particularly the case for Laplace transforms and Fourier transforms, 
as we will see below. Given a function that we will integrate with respect to 
one variable, and differentiate with respect to a different integral, under what 
circumstances does first integrating and then differentiating give the same result 
as first differentiating, then integrating? Using the dominated convergence 
theorem, we get the following very general result. 


This theorem is a major result 
with far-reaching consequences. 


Theorem 4.11.18 (Exchanging derivatives and integrals). Let 

f(t, x) : M n+1 — ► R be a function such that for each fixed t, the integral 


F(t)= f /(t,x) |tf*x| 4.11.56 

exists. Suppose moreover that D t f exists for all x except perhaps a set of x 
of volume 0, and that there exists an integrable function g(x) such that 


/(s,x) ~/(t,x) 
s — t 


<$W 


4.11.57 


for all s^t. Then F(t) is differentiable, and its derivative is 


DF(t)= [ D t f(t,x) |(Tx|. 
Jm n 


4.11.58 


Proof. Just compute: 

DF(t) = lim F(t + h) ~ F(t) = lim f /(< + h,x) - /(t,x)|d"x| 
h ~*° h jE n h 

f /(£ + h,x) - /(f.xJl^xl f 

= L h = L D,f(t - x) l<rx|; 


4.11.59 


moving the limit inside the integral sign is justified by the dominated conver- 
gence theorem. □ 


Applications to the Fourier and Laplace transforms 


Recall (Equation 0.6.7) that 
the length, or absolute value, of a 
complex number a+ib is \J a 2 + b 2 . 
Since e tf = cos t + i sin t . we have 
\e %t \ — \J cos 2 t + sin 2 1 = 1. 


Fourier transforms and Laplace transforms give important example of differen- 
tiation under the integral sign. If / is an integrable function on JR, then so is 
f(x)e l * x for each (€lR, since 

l/(s)e**| = |/(x)|. 4.11.60 

So we can consider the function 

/(£) = f f(x)e* T dx. 

J a 


4.11.61 
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Exercises for Section 4.1: 
Defining the Integral 


Passing from / to / is one of the central constructions of mathematical 
analysis; many entire books are written about it. We want to use it as an 
example of differentiation under the integral sign. 

According to Theorem 4.11.18, we will have 


Df(() = J D ( (e ix( f(x)) dx = ixj^f(x)dx = Ufa), 


4.11.62 


provided that the difference quotients 

e i(t+h)x _ e t<x 


h 


f(x) 


Ahx— 1 


h 


I /(*) 


4.11.63 


are all bounded by a single integrable function. Since |e*a - 1| = 2|sin(a/2)| < 
|a| for any real number a, we see that this will be satisfied if | xf(x) is an 
integrable function. 

Thus the Fourier transform turns differentiation into multiplication, and cor- 
respondingly integration into division. This is a central idea in the theory of 
partial differential equations. 


.12 Exercises for Chapter Four 


4.1.1 (a) What is the two-dimensional volume (i.e., area) of a dyadic cube 
C G P 3 (R 2 )? of C € X>„(R 2 )? of C 6 Z? S (R 2 )? 

(b) What is the volume of a dyadic cube C G ©s(R 3 )? of C G Z> 4 (R 3 )? of 

CGZ> 5 (R 3 )? 

4.1.2 In each group of dyadic cubes below, which has the smallest volume? 
the largest? 

(a)C m ; Cr j-] ;C ril (b) C G P 2 (R 3 ); C G P,(R 3 ); C G Z> S (R 3 ) 

I2J’ 4 [ 2 [ 2 l 2 j ' 6 

4.1.3 What is the volume of each of the following dyadic cubes? What 
dimension is the volume (i.e., are the cubes two-dimensional, three-dimensional 
or what)? What information is given below that you don’t need to answer those 
two questions? 


T 

(b)C 

,3 

O' 

(c) C 

O' 

(d)C 

O' 

.2. 


1 

a 

1 


1 



3 


1 

i3 

. 4 . 





.1. 




4.1.4 Prove Proposition 4.1.18. 

4.1.5 Prove that the distance between two points x, y in the same cube C € 
V N (R n ) is 
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i a 


Figure 4.1.8. 

The heavy line is the graph of 
the function xX[o,a)(*) 


In Exercises 4.1.8 and 4.1.9, 
you need to distinguish between 
the cases where a and b are 
“dyadic,” i.e., endpoints of dyadic 
intervals, and the cases where they 
are not. 


4.1.6 Consider the function 

{ 0 if |x| > 1, or x is rational 

1 if |x| < 1, and x is irrational. 

(a) What value do you get for the “left-hand Riemann sum,” where for the 
interval 

^ f I A: k + 1\ 

Ck,N — | 2 N - x < 2 n ) 

you choose the left endpoint A;/2 N ? The right-hand Riemann sum? The mid- 
point Riemann sum? 

(b) What value do you get for the “geometric mean” Riemann sum, where 
the point you choose in each Ck,w is the geometric mean of the two endpoints, 



y/h(k+ 1) ? 
2 n 


4.1.7 (a) Calculate 

(b) Calculate directly from the definition the integrals 

/ ^X{o,i)(a:)|dx|, f xX[o,i](^)|dx|, f xX(o,i](^)|dx|, f xX(o,i)(z)|d£|. 

«/* Jm J k Jvl 

In particular show that they all exist, and that they are equal. 

4.1.8 (a) Calculate 1 . 

(b) Choose a > 0, and calculate directly from the definition the integrals 

0,oj(* c )l^ a 'l> 0,a](^)|da;|, J XX(0,a)(^)|dx|. 

(The first is shown in Figure 4.1.8.) In particular show that they all exist, and 
that they are equal. 

(c) If a < 6, show that xX[ 0 ,&j, xX( a ,6)» zX( 0> &j, xX( a ,b) are all integrable and 
compute their integrals, which are all equal. 


4.1.9 (a) Calculate £ t n =0 i 2 . 

(b) Choose a > 0, and calculate directly from the definition the integrals 

^* 2 X[o.«)(*)l<H j£i 2 X| 0 ,o|(*)l<*r|, J x 2 X (0 ,o|(i)l<ir|, j x 2 X (0A) (x)\dx.\ 

In particular show that they all exist, and that they are equal. 

(c) If a < 6, show that i 2 X (a , 6)l x 2 X |o . t) , x 2 X (0i6) , x%„ M are all integrable 
and compute their integrals, which are all equal. 
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4.1.10 Let Q C R 2 be the unit square 0 < x, y < 1. Show that the function 

/(£)=*"(* -»)*«(*) 

is integrable by providing an explicit bound for Us(f) - Ln(/) which tends to 
0 as N — ► oo. 

4.1.11 (a) Let A = [oi, fei] x • • • x [a n , 6 n ] be a box in R n , of constant density 
fj. = 1. Show that the center of gravity is the center of the box, i.e., the point 
c with coordinates c* = (a* + bi)/2. 

(b) Let A and B be two disjoint bodies, with densities /ii and / 12 , and set 
C = A U B. Show that 

__ M(A)x(A) + M(B)x(B) 

( } (A) + M(B) 

4.1.12 Define the dilation of a function by a of a function / : R n — ► R by the 
formula 

A*/(x) = / . Show that if / is integrable, then so is D 2 n/, and 


/ £W(x)|<Tx| « 2 n [ /(x) IcTxl. 

Jw n Jtt n 

(b) Recall that the canonical cubes are half open, half closed. (You should 
have used this in part (a)). Show that the closed cubes also have the same 
volume. (This is remarkably harder to carry out than you might expect.) 



4.1.13 Complete the proof of Lemma 4.1.15. 


4.1.14 Evaluate the limit 


lim 

N-+00 


N 2 N 


1 4* *4» 


k=l 1=1 


4.1.15 (a) What are the upper and lower sums U\(f) and L x (f) for the 

function 



x 2 + y 2 
0 


if 0 < x, y < 1 
otherwise 


i.e., the upper and lower sums for the partition 2>i(R 2 ), shown in Figure 4.1.15? 

(b) Compute the integral of the function / and show that it is between the 
upper and lower sum. 


4.1.16 (a) Does a set with volume 0 have volume? 

(b) Show that if X and Y have volume 0, then X n Y, X x Y , and X U Y 
have volume 0. 
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Exercise 4.1.17 shows that the 
behavior of an integrable function 
/ : — * E on the boundaries of 

the cubes of T>n does not affect the 
integral. 

Starred exercises are difficult; 
exercises with two stars are more 
difficult yet. 


Hint: You may assume that 
the support of / is contained in 
Q, and that |/| < 1. Choose 
€ > 0, then choose N i to make 
U N U)-L N {f) < «/2, then choose 
N 2 > Ni to make vo1(X^d N2 < 
if 2. Now show that for N > N 2 , 

Un(/) - LnU) < « and 
T7 N (f)-L N (f)<e. 


Exercises for Section 4.2: 
Probability 


(c) Show that j ( q 1 ) € M 2 0 < x x < lj has volume 0 (i.e., vol 2 = 0). 


Xl 

(d) Show that ^ ( x 2 | € M 3 

0 


0 < X\,X2 < 1 


> has volume 0 (i.e., V0I3 = 0). 


*4.1.17 (a) Let S be the unit cube in 1R”, and choose a € [0, 1). Show that 

the subset {x € S j x, = a} has n-dimensional volume 0. 

(b) Let dT>s be the set made up of the boundaries of all the cubes C € Pa r- 
Show that vol n (X/v n S) = 0. 

(c) For each 


o 

C 


-{ 
H 


x € 


x € 


sn 


tin 


ki ki + 1 I 

2 N ~ Xi< 2 n J ’ SCt 

k t ki + 1 'j jt; f _ ki ^ __ ^ k% •¥ 1 ^ 

2^ < x * < ~2 ~ J and ^ ~ | x e ® 2 * - Xi - 2 n J 


These are called the interior and the closure of C respectively. 
Show that if f :R n —> IR is integrable, then 


II 

3 

lim 

N—oo 

E 

Mo(/)vol n (C) 



cev N (B" 

) 

11 

3 

lim 

n—oo 

E 

m^{f)yo\ n (C) 



C£V N ( K n 

) 

VnU) = 

lim 

N—oo 

E 

A%(/)vol n (C) 



C€P.v(R n 

) 

LnU) = 

lim 

N —oc 

E 

m^(/)vol„(C) 


C€P/v(® n ) 

all exist, and are all equal to / Rn /(x)|d n x|. 

(d) Suppose / : IR” — > R is integrable, and that /(— x) = -/(x). Show that 
/I^Xl = 0. 


4.2.1 (a) Suppose an experiment consists of throwing two dice, each of which 
is loaded so that it lands on 4 half the time, while the other outcomes are 
equally likely. The random variable / gives the total obtained on each throw. 
What are the probability weights for each outcome? 

(b) Repeat part (a), but this time one die is loaded as above, and the other 
falls on 3 half the time, with the other outcomes equally likely. 

4.2.2 Suppose a probability space X consists of n outcomes, {1,2, ...,n}, 
each with probability 1/n. Then a random function / on X can be identified 
with an element / € 31”. 
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Exercises for Section 4.3: 

What Functions 
Can Be Integrated 

Hint for Exercise 4.3.1 (a): im- 
itate the proof of Theorem 4.3.6, 
writing the unit circle as the union 
of four graphs of functions: y = 
y/x 2 - 1 for |x| < -s/2/2, and the 
three other curves obtained by ro- 
tating this curve around the origin 
by multiples of 7r/2. 


We will give further applica- 
tions of this result in Exercise 
4.5.17. 


(a) Show that E(f) = 1 /n(f- 1), where 1 = 

(b) Show that 


(c) Show that 


Var (/) = i \f- E(f)l\ 2 , 

7 L 

°U) = 

Cov (f,g) = Uf- E(f)T) ■ (g- E(g) l); 

7v 


corr (f,g) = cos0, _ _ 

where d is the angle between the vectors / - E(f) 1 and g — £(< 7 ) 1 . 

4.3.1 (a) Give an explicit upper bound for the number of squares C € Z>jv(IR 2 ) 
needed to cover the unit circle in IR 2 . 

(b) Now try the same exercise for the unit sphere S 2 C K 3 . 

4.3.2 For any real numbers a < 6, let 

Qa, b = {x 6 K n | a < Xi < b for all 1 < i < n } , 

and let P£ b C Q” b be the subset where a < x\ < X 2 • • • < x n < b. 

Let / : R n — * IR be an integrable function that is symmetric in the sense that 


for any permutation a of the symbols 1, 2, . . . , n. 



/ \<Tx\ = n! / |<Tx|. 

J Q2.> J 

(b) Let / : [a, 6] — * IR be an integrable function. Show that 

J pn /(xi)/(x 2 ).../(x n )|crx| = i ^ /(x)|dx|j . 

4.3.3 Prove Corollary 4.3.11. 

4.3.4 Let P be the region x 2 < y < 1. Prove that the integral 

j smy 2 \dxdy\ 

exists. You may either apply theorems or prove the result directly. If you use 
theorems, you must show that they actually apply. 
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Exercises for Section 4.4: 
Integration and Measure Zero 


4.4. 1 Show that the same sets have measure 0 regardless of whether you define 
measure 0 using open or closed boxes. Use Definition 4.1.13 of n-dimensional 
volume to prove this equivalence. 


4.4.2 Show that X € HU 1 has measure 0 if and only if there exists an infinite 
sequence of balls 

oo 

B t = {x € | |x - a*| < r, } with < e 

»=i 


such that X c 


4.4.3 Show that if X is a subset of R n such that for any e > 0, there exists 
a sequence of pavable sets £,, i = 1,2,... satisfying 

oo oo 

X C (J Bi and vol n (£i) < e , 

i=i «=i 

then X has measure 0. 


4.4.4 (a) Show that Q C M has measure 0. More generally, show that any 

countable subset of R has measure 0. 

(b) Show that a countable union of sets of measure 0 has measure 0. 

**4.4.5 Consider the subset U c [0, 1] which is the union of the open inter- 
vals 

\9 9 3 ’? W 

for all rational numbers p/q € [0,1]. Show that for C > 0 sufficiently small, 
U is not pavable. What would happen if the 3 were replaced by a 2? (This is 
really hard.) 


Exercises for Section 4.5: 
Fubini’s Theorem 
and Iterated Integrals 


4.5.1 In Example 4.5.2, why can you ignore the fact that the line x = 1 is 
counted twice? 

4.5.2 (a) Set up the multiple integral for Example 4.5.2, where the outer 
integral is with respect to y rather than x. Be careful about which square root 
you are using. 

(b) If in (a) you replace +y/y by -y/y and vice versa, what would be the 
corresponding region of integration? 


4.5.3 Set up the multiple integral /(/ fdx)dy for the truncated triangle 
shown in Figure 4.5.2. 
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4.5.4 (a) Show that if 

c n = f (1 - t 2 )^ n ~ 1 ^ 2 dt, then c n = - — -c n _ 2 , for n > 2. 
J - i n 


(b) Show that Co = 7r and ci = 2. 

4.5.5 Again for Example 4.5.6, show that 


7T 


fo k — -j j and 


ir k k\ 2 2fc+1 

l2fcTT)T 


4.5.6 Write each of the following double integrals as iterated integrals in two 
ways, and compute them: 

(a) The integral of sin(x + y) over the region x 2 < y < 2. 

(b) The integral of x 2 + y 2 over the region 1 < |x|, jy| < 2. 


4.5.7 In Example 4.5.7, compute the integral without assuming that the first 
dart falls below the diagonal (see the footnote after Equation 4.5.25). 


4.5.8 Write as an iterated integral, and in three different ways, the triple 
integral of xyz over the region x, y, z > 0, x + 2y + 3z < 1. 


4.5.9 (a) Use Fubini’s theorem to express 


f n ( f n sin x \ 

J yj - ^ dxj ay as a double integral. 

(b) Write the integral as an iterated integral in the other order. 

(c) Compute the integral. 


4.5.10 (a) Represent the iterated integral 



as the integral of s/ye~ v over a region of the plane which you should sketch. 

(b) Use Fubini’s theorem to make this integral into an iterated integral, first 
with respect to x and then with respect to y. 

(c) Evaluate the integral. 


4.5.11 You may recall that the proof of Theorem 3.3.9, that 

^ D 2 (D 1 (f)) 

was surprisingly difficult, and only true if the second partials axe continuous. 
There is an easier proof that uses Fubini’s theorem. 
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(a) Show that if U C M 2 is an open set, and /:£/-*& is a function such 
that 

D 2 {D l (f)) and D X {D 2 ( })) 

both exist and are continuous, and if D\(D 2 (f)) # D 2 (D\{f)) for 

some point ( £ ) , then there exists a square S C U such that either 

D 2 (Di(f)) > D\(D 2 (f)) on 5 or Di(D 2 (/)) > A>(£>i(/)) on 5. 

(b) Apply Fubini’s theorem to the double integral 

I ^(d 2 (D,(/)) -D x (Di(f)))dxdy 

to derive a contradiction. 

(c) The function 

'(;)-{ 

is the standard example of a 
happens to the proof above? 

4.5.12 (a) Set up in two different ways the integral of sin y over the region 
0<x<cos2/,0<y<7r/6asan iterated integral. 

(b) Write the integral 

- dxdy 
x 

as an integral, first integrating with respect to y, then with respect to x. 

4 . 5.13 Set up the iterated integral to find the volume of the slice of cylinder 
x 2 + y 2 < 1 between the planes 

„ 1 1 
z = 0, z = 2, y = y = 

4 . 5.14 Compute the integral of the function z over the region R described 
by the inequalities x > 0, y > 0, z > 0, x + 2y + 3z < 1. 

4 . 5.15 Compute the integral of the function \y - x 2 \ over the unit square 
0 < x,y < 1. 

4 . 5.16 Find the volume of the region bounded by the surfaces 

z = x 2 +y 2 and z = 10 - x 2 — y 2 . 



0 otherwise, 

function where Di(D 2 f)) ^ D 2 (Di(f)). What 
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Figure 4.5.18. 

The region 

_2 2 

— E + — K < i 

(* 3 - l ) 2 (* 3 + 1) 2 ~ ’ 

-1 < 2 < 1 , 

which looks like a peculiar pillow. 


Exercises for Section 4.6: 
Numerical Methods 
of Integration 


4.5.17 Recall from Exercise 4.3.2 the definitions of C Q2 >b . Apply the 
result of Exercise 4.3.2 to compute the following integrals. 14 

(a) Let A/ r (x) be the rth largest of the coordinates x lt . . . ,z n of x. Then 

/ M r (x)|(Tx| = ~~~ 7 - 
JQo.i n + 1 


(b) Let n > 2 and 0 < 6 < 1. Then 



nb-b n 
n - 1 


4.5.18 What is the volume of the region 


V 


(z 3 - l) 2 (z 3 + l) 2 

shown in Figure 4.5.18? 


< 1, -1 < z < 1, 


4.5.19 What is the z-coordinate of the center of gravity of the region 


V 4 


(z 3 - 1)2 (*» + !)* 


<1, 0 < z < 1. 


4.6.1 (a) Write out the sum given by Simpson’s method with 1 step, for the 

integral 

f /(x)i<rx| 

Jq 

when Q is the unit square in K 2 and the unit cube in K 3 . There should be 9 
and 27 terms respectively. 

(b) Evaluate these sums when 

/( x ) = l 

J \y) \+x + y' 

and compare to the exact value of the integral. 


4.6.2 Find the weights and control points for the Gaussian integration scheme 
by solving the system of equations 4.6.9, for k = 2, 3, 4, 5. Hint: Entering the 
equations is fairly easy. The hard part is finding good initial conditions. The 
following work: 


A: = 1 W\ — 17 x\ = .57 


W\ — .6 x\ = .3 

W2 = 4 X2 = .8 


14 This exercise is borrowed from Tiberiu Trif, “Multiple integrals of symmetric 
functions,” American Mathematical Monthly , Vol. 104, No. 7 (1997), pp. 605-608). 
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uj j — .35 X] — • .2 

w\ = .5 X\ = .2 

u?9 = .3 X2 — *5 

A: = 3 tu 2 = .3 x 2 = .7 * = 4 

1^3 = .2 £3 = .8 

tt >3 = .2 £3 = .9 

W4 = .1 £4 = .95 

The pattern should be fairly clear; experiment to find initial conditions when 
A: = 5. 


w\ = .5 £1 = .2 

A: = 3 W2 = -3 £ 2 = .7 

tt >3 = 2 £3 = .9 


4.6.3 Find the formula relating the weights Wi and the sampling points X t 
needed to compute f b f(x)dx to the weights u>* and the points £, appropriate 
for f* j f(x)dx. 

4.6.4 (a) Find the equations that must be satisfied by points £1 < • • • < x p 
and weights w\ < • • • < w p so that the equation 


rOC P 

/ p(x)e~ x dx = V «>*/(**) 

^0 rr? 


is true for all polynomials p of degree < d. 

(b) For what number d does this lead to as many equations as unknowns? 

(c) Solve the system of equations when p = 1. 

(d) Use Newton’s method to solve the system for p = 2, . . . , 5. 

(e) For each of the degrees above, approximate 

Jr 00 *00 

f e _4 sin£d£ and / e~ 4 log xdx. 

0 Jo 

and compare the approximations with the exact values. 

4.6.5 Repeat the problem above, but this time for the weight e~ x \ i.e., find 
points Xi and Wi such that 


/ oo k 

p(x)e~* 2 = 'Y^w i p(xi) 

'OO .'A 


is true for all polynomials of degree < 2A: — 1. 

4.6.6 (a) Show that if 


/ b J2U r b n 

/(£)d£ = 2 ^,Cif(xi) and / g(x)dx = Cig(xi), 
i=l Ja i^i 


I M M nx)9{y)ldxdyl 

1 1 ' *=1 j=i 


then 
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(b) What is the Simpson approximation with one step of the integral 

I x 2 y 3 \dxdy\. 

J[ 0,l)x(0,l] 

4.6.7 Show that there exist c and u such that 

/ 1 ^7f=f = + /(" u )) 

when / is a polynomial of degree d < 3. 


Exercise 4.6.8 was largely in- 
spired by a corresponding exer- 
cises in Michael Spivak’s Calculus. 


*4.6.8 In this exercise we will sketch a proof of Equation 4.6.3. There are 
many parts to the proof, and many of the intermediate steps are of independent 
interest. 

(a) Show that if the function / is continuous on [ao, a n ) and n times differen- 
tiable on (ao, a n ), and / vanishes at the n+ 1 distinct points ao < a i < • • • < On, 
then there exists c £ (ao,a n ) such that /<”>( c) = 0. 

(b) Now prove the same thing if the function vanishes with multiplicities. 
The function / vanishes with multiplicity k + 1 at a if f(a) = /'(a) = • • • = 
/( fc )(a) = 0. Then if / vanishes with multiplicity k t + 1 at a*, and if / is 
N = n + 5Z" =0 ki times differentiable, then there exists c £ (ao,a n ) such that 
/<*>(■ c)=0. 


Hint for Exercise 4.6.8(c): 
Show that the function g(t) — 
q(x)(f(t - p(t)) - q(t)(f(x) - p(x)) 
vanishes n + 2 times; and recall 
that the n + 1st derivative of a 
polynomial of degree n is zero. 


(c) Let / be n times differentiable on [a 0 ,a n ], and let p be a polynomial of 
degree n (in fact the unique one, by Exercise 2.5.16) such that /(a*) = p(aj), 
and let 

n 

q(x) = 

»=0 

Show that there exists c £ ( ao,a n ) such that 


/(I) ~ P(I) = ?w 9(x) - 

(d) Let / be 4 times continuously differentiable on (a, 6], and p be the poly- 
nomial of degree 3 such that 

m - *>. / (!±i) - p (!±i) , r (^i) - ✓ (i±i) , m - m 

Show that 

C f(x)dx = V (/w + v (^f 6 ) + m) - £ ) 

for some c £ (a, 6J. 

(e) Prove Formula 4.6.3: If / is four times continuously differentiable, then 
there exists c £ (a, 6) such that 



(b - a) 5 
2880n 4 


/ <4) ( c). 
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Exercises for Section 4.7: 
Other Pavings 

Hint for Exercise 4.7.1: This 
is a fairly obvious Riemann sum. 
You are allowed (and encouraged) 
to use all the theorems of Section 
4.3. 


Exercises for Section 4.8: 
Determinants 


Hint: think of multiplying the 
column through by 2, or by -4. 


4.7.1 (a) Show that the limit 

1 V — ' 

J“” vs 2- 


oc N 3 

0 <n,m<N 

(b) Compute the limit above. 


me 


-nm(N 4 


exists. 


4.7.2 (a) Let A(R) be the number of points with integer entries in the disk 

x 2 + y 2 < H 2 . Show that the limit 

, A(R) 

ifcSe ~W 

exists, and evaluate it. 

(b) Now do the same for the function B(R) which counts how many points 
of the triangular grid 

{ n ( q ) + m n, m € X | are in the disc. 


4.8. 1 Compute the determinants of the following matrices, using development 
by the first row: 


-1 

-2 

3 

o- 


’I 

1 

2 

r 


-1 

2 

3 

4‘ 

4 

5 

0 

-1 

1 

2 

2 

1 

(b) 

0 

1 

3 

2 

4 

3 

1 

1 

(C) 

0 

3 

1 

0 

-1 

1 

3 

1 

.3 

2 

1 

0. 


.2 

1 

0 

4. 


.1 

2 

-2 

0. 


4.8.2 (a) What is the determinant of the matrix 

’b a 0 O' 

0 b a 0 ? 

0 0 6 a' 

.a 0 0 6. 

(b) What is the determinant of the corresponding n x n matrix, with 6’s on 
the diagonal and a’s on the slanted line above the diagonal and in the lower 
left-hand corner? 

(c) For each n, what are the values of a and 6 for which the matrix in (h) is 
not invertible? Hint: remember complex numbers. 


4.8.3 Spell out exactly what the three conditions defining the determinant 
(Definition 4.8.1) mean for 2 x 2 matrices, and prove them. 


4.8.4 (a) Show that if a square matrix has a column of zeroes, its determinant 

must be zero, using the multilinearity property (property (1)). 

(b) Show that if two columns of a square matrix are equal, the determinant 
must be zero. 
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4.8.5 If A and B are n x n matrices, and A is invertible, show that the 


function 


f(B) = 


det ( AB ) 
det A 


has properties (1), (2), and (3) (multilinearity, antisymmetry, normalization) 
and that therefore f(B) = det B. 


4.8.6 Give an alternative proof of Theorem 4.8.11, by showing that 

(a) If all the entries on the diagonal are nonzero, you can use column opera- 
tions (of type 2) to make the matrix diagonal, without changing the entries on 
the main diagonal. 

(b) If some entry on the main diagonal is zero, row operations can be used 
to get a column of zeroes. 


4.8.7 Prove Theorem 4.8.14: If A is an n x n matrix and B is an m x m 
matrix, then for the (n + m) x (n 4- m) matrix formed with these as diagonal 
elements, 


det 


A 

0 


0 

B 


= det A det B. 


4.8.8 What elementary matrices are permutation matrices? Describe the 
corresponding permutation. 

4.8.9 Given two permutations, a and r, show that the transformation that 
associates to each its matrix {M a and M T respectively) is a group homomor- 
phism: it satisfies M aor = M a M T . 

4.8.10 In Example 4.8.17, verify that the signature of <r 5 and <r G is —1. 

4.8.11 Show by direct computation that if A and B are 2 x 2 matrices, then 
tr(i4B) = tr(Bv4). 

4.8.12 Show that if A and B are n x n matrices, then tr(;4B) = tr(B.4). 
Start with Corollary 4.8.22, and set C = P y D = AP~ l . This proves the 
formula when C is invertible; complete the proof by showing that if C n is a 
sequence of matrices converging to C, and tr(C n B) = tr (DC n ) for all n, then 
tr(CD) = ti(DC). 

*4.8.13 For a matrix A , we defined the determinant D(A) recursively by 
development according to the first column. Show that it could have equally 
well been defined, with the same result, as development according to the first 
row. Think of using Theorem 4.8.10. It can also be proved, with more work, 
by induction on the size of the matrix. 
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*4.8.14 (a) Show that if A is an n x n matrix of rank n - 1, then [Ddet(y4)J : 

Mat (n, n) — * E is not the zero transformation. 

(b) Show that if A is an n x n matrix with rank(^4) < n - 2, then [D det(yl)] : 
Mat (n, n) — ► IR is the zero transformation. 


Exercises for Section 4.9: 4.9.1 Prove Theorem 4.9.1 by showing that vol„ T{Q) satisfies the axiomatic 

Volumes and Determinants definition of the absolute value of determinant (see Definition 4.8.1). 



Figure 4.9.2. 


Exercise 4.9.3, part (a): Yes, do 
use Fubini 

Exercise 4.9.3, part (b): No, 
do not use Fubini. Find a lin- 
ear transformation S such that 
S(Ti) = Ti- 


4.9.2 Prove Equation 4.9.13 by “dissection,” as suggested in Figure 4.9.2. 

4.9.3 (a) What is the volume of the tetrahedron T\ with vertices 


'o' 


P 1 w 

1 


'o' 

0 


0 

1 

1 

0 


0 


0 


(b) What is the volume of the tetrahedron T 2 with vertices 


o' 


' 2 ' 


'-1' 


r- 2 i 

0 

0 

L * 


1 

1 

L J 

> 

3 

1 

L J 

> 

1 

10 cs 

1 

1 


4.9.4 What is the n-dimensional volume of the region 

{x G 1 | ^ 0 for all i ~ 1, . . . , n and x\ +•••■+■ x n < 1 }? 


4.9.5 Let T : R n — ► R n be given by the matrix 


’1 

0 

0 ... 

0* 

2 

2 

0 ... 

0 

3 

3 

3 ... 

0 

• 

• 

• 

• • 

• 

.n 

n 

n ... 

n. 


and let A c be given by the region given by 

1^1 1 + l^l 2 + |^3| 3 + ■ ' ’ + |x n | n < 1. 

What is voln(T(>l))/vol n (i4)? 

4.9.6 What is the n-dimensional volume of the region 

{x 6 M n | x, > 0 for all t = 1, . . . , n and X\ + 2 x 2 f nx n < n }? 


4.9.7 

Let q(x) be a continuous function on K, and suppose that f(x) and g(x) 
satisfy the differential equation 

/"(*) = q(x)f(x), g"(x) = q(x)g(x). 
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Exercises for Section 4,10: 
Change of Variables 


Hint for Exercise 4. 10.4 (a): use 
the variables u — x/a, v = y/b. 


Express the area A(x) of the parallelogram spanned by 


' f(z ) ' 


g(x) 


1 

_g'(x) 


in terms of j 4(0). Hint: you may want to differentiate A(x). 


4.9.8 (a) Find an expression for the area of the parallelogram spanned by 

Vx and V 2 , in terms of |vi|, IV 2 I, and |vx - V 2 I. 

(b) Prove Heron’s formula: the area of a triangle with sides of length a, b, 
and c, is 


\/p(p-a)(p-b)(p-c) , 


where 


V = 


a + 6 + c 
2 


4.9.9 Compute the area of the parallelograms spanned by the two vectors in 
(a) and (b), and the volume of the parallelepipeds spanned by the three vectors 
in (c) and (d). 


( a ) 

• «■ 

5 

2 

1 

-2 

-1 

( C ) 

■ r 

3 

> 

2 

-1 

1 

" 3 * 

6 






-1 


-1 


0 


(b) 

'6' 

4 

5 

'3 

2 

(d) 

'2' 

3 

> 

‘-r 

2 

> 

’O' 

1 


• m 




4 


3 


2 


4.10.1 Using Fubini, compute the integral of Example 4.10.4: 



+ y 2 )dxdy, 


where 


Dr = UiO 6 r2 i * 2 + y2 - r2 } • 

4.10.2 Show that in complex notation, with z = x 4- iy, the equation of the 
lemniscate can be written jz 2 + 1| = 1. 


4.10.3 Derive the change of variables formula for cylindrical coordinates from 
the polar formula and Fubini’s theorem. 


4.10.4 (a) What is the area of the ellipse 


a 2 - 


1? 


(b) What is the volume of the ellipsoid 

x 2 y 2 z 2 

h f- — 

a 2 b 2 c 2 

4.10.5 (a) Sketch the curve in the plane 

equation 


< 1 ? 

given in polar coordinates by the 


r = 1 + sin0, 0 < 0 < 27 r. 
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Hint for Exercise 4.10.7: You 
may want to use Theorem 3.7.12. 


Hint: First transform it into 
a multiple integral, then pass to 
spherical coordinates. 


(b) Find the area that it encloses. 

4.10.6 A semi-circle of radius R has density P ( ^ — m{x 2 +y 2 ) proportional 

to the square of the distance to the center. What is its mass? 


4.10.7 Let A be an n x n symmetric matrix, such that the quadratic form 
Qa{*) = x- Ax is positive definite. What is the volume of the region Q(x) < 1? 


4.10.8 Let 



y2 £2 

x > 0, j/ > 0 , z > 0,-2+7 o+^<1 

a 2 b 2 c 2 


Compute 


I xyz\dxdydz\. 
Jv 


4.10.9 (a) What is the analog of spherical coordinates in four dimensions. 

What does the change of variables formula say in that case. 

(b) What is the integral of |x| over the ball of radius R in M 4 . 


4.10.10 Show that the mapping 


5: 



rsin<^cos0\ 
rsin</?sin0 I 
r cos >p ) 


with 0 < r < oc,0 < 9 < 27r, and 0 < y> < 7r, parametrizes space by the 
distance from the origin r, the polar angle 6, and the angle from the north pole, 

p. 


4.10.11 Justify that the volume of the sphere of radius R is tR 3 . 

*3 

4.10.12 Evaluate the iterated integral 



y/4-x 2 - y 2 

{x 2 + y 2 + z 2 ) 3/2 dz dy dx. 


4.10.13 Find the volume of the region between the cone of equation z 2 = 
x 2 + y 2 and the paraboloid of equation z = x 2 + y 2 . 


4.10*14 (a) Let Q be the part of the unit ball x 2 -\-y 2 +z 2 < 1 where x,y, z > 0. 

Using spherical coordinates, set up the integral 

j (x + y+z) |d» x| as an iterated integral. 
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Hint for Exercise 4.10.15: You 
may wish to use the trigonometric 
formulas 

4 cos 3 0 — cos 30 + 3 cos 0 
2 cos <p cos 0 = cos(0 + ip) 

+ cos(0 — ip). 


(b) Compute the integral. 

4 . 10.15 (a) Sketch the curve given in polar coordinates by r = cos 20, |0| < 
tt/4. 

(b) Where is the center of gravity of the region 0 < r < cos 20, |0| < 7r/4. 

4 . 10.16 What is the center of gravity of the region A defined by the inequal- 
ities x 2 + y 2 < z < l,x > 0,y > 0? 

4 . 10.17 Let Q a = [0, a] x [0,aj C R 2 be the square of side a in the first 
quadrant, with two sides on the axes, 4> : M 2 — ♦ R 2 be given by 

and A = $(Q«). 

(a) Sketch A , by computing the image of each of the sides of Q a (it might help 
to begin by drawing carefully the curves of equation y = e x + 1 and y = e' * + !)• 

(b) Show that $ : Q a —> ► A is one to one. 

(c) What is f A y\dxdy\7 

4 . 10.18 What is the volume of the part of the ball x 2 + y 2 + z 2 < 4 where 
z 2 > x 2 + y 2 , z > 0? 

4 . 10.19 Let Q = [0, 1] x [0, 1] be the unit square in IP: 2 , and let : 1R 2 — > R 2 

be given by 

♦(sMjf+S) and 4 = 

(a) Sketch A, by computing the image of each of the sides of Q (they are all 
arcs of parabolas). 

(b) Show that $ : Q — * A is 1-1. 

(c) What is J A x\dxdyl? 

4 * 10.20 The moment of inertia of a body X C IR 3 around an axis is the 
integral 

f (r(x)) 2 |d 3 x|, 

Jx 

where r(x) is the distance from x to the axis. 

(b) Let / be a positive continuous function of x € [a, 6J. What is the moment 
of inertia around the x-axis of the body obtained by rotating the region 0 < 
V < f(x), a < x < b around the x-axis. 

(c) What number does this give when 

f(x) = cos x. a = 
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Exercises for Section 4.11: 4.11.1 (a) Show that the three sequences of functions in Example 4.11.11 do 

Improper Integrals not converge uniformly. 

*(b) Show that the sequence of polynomials of degree < m 

Pk(x) = ao ,k + ai jfcX + • • • + a m> kx m 

does not converge uniformly on 1R, unless the sequence a^ k is eventually constant 
for all i > 0 and the sequence do,* converges. 

(c) Show that if the sequences < 2 *,* converge for each i < m , and A is a 
bounded set, then the sequence p k XA converges uniformly. 

4.11.2 Leta n = m^. 

(a) Show that the series a„ is convergent. 

*(b) Show that a n = log 2. 

(c) Explain how to rearrange the terms of the series so that it converges to 
5. 

(d) Explain how to rearrange the terms of the series so that it diverges. 

4.11.3 For the first two sequences of functions in Example 4.11.11, show that 

lim lim f [f k (x)] R dx ^ lim lim f [f k (x)] R dx. 

k — 00 R— >00 J ft R—*oo k—*oo J ft 


4.11.4 In this exercise we will show that 

•00 


/•OO 

I si] 
JO ' 


sm X , 7T 

ax — 

0 x 2 


4.11.4 


This function is not integrable in the sense of Section 4.11, and the integral 
should be understood as 


/•OO 

Jo 


sinx 


dx = lim 


a — *00 


f a sinx 

Jo X 


dx. 


(a) Show that 

/ roc 


aj; 


e pi sin xdx 


)-ro : 


e px sin xdp I dx. 


for all 0 < a < b < 00. 

(b) Use (a) to show 


arctan b - arctan 


a= J 


b (e ax - e k^sinx 


dx. 


(c) Why does Theorem 4.11.12 not imply that 


lim lim f 
0— *0 6—00 J Q 


00 ((e~ ax -e~ bx ) sin x 


dx 


_ f°° sinx 
Jo x 


dx ? 
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(d) As it turns out, the equation in part (c) is true anyway; prove it. The 
following lemma is the key: If c„(t) > 0 are monotone increasing functions of 
t, with lim*—oo Cn(t) = C n , and decreasing as a function of n for each fixed t . 
tending to 0, then 

1 !i i 5 > E(- 1 ) nc "( < ) = E(- 1 ) nc »- 

n= 1 n=l 

Remember that the next omitted term is a bound for the error for each partial 
sum. 

(e) Write 



((e ax - e 6 *)sina: 


x 


dx 


-±l 

n=0 J,c 


kir % 


and use (d) to prove Equation 4.11.4. 


4.11.5 (a) Show that the integral ~^-dx of Equation 4.11.5 is equal to 

the sum of the series 


Hint for Exercise 4.11.6: you 
will need the dominated conver- 
gence theorem (Theorem 4.11.12) 
to prove this. 



4.11.6 Let P k be the space of polynomials of degree at most k. Consider the 
function F : P k R given by p ^ £ |p( x )| dx. 

(a) Show that F is differentiable except at 0, and compute the derivative. 

*(b) Show that if p has only simple roots between 0 and 1, then F is twice 
differentiable at p. 




5 

Lengths of Curves, Areas of Surfaces, ... 


5.0 Introduction 

In Chapter 4 we saw how to integrate over subsets of R n , first using dyadic 
pavings, and then more general pavings. But these subsets are flat. What 
if we want to integrate over a (curvy) surface in K 3 , or more generally, a k- 
manifold in R n ? There are many situations of obvious interest, like the area 
of a surface, or the total energy stored in the surface tension of a soap bubble, 
or the amount of fluid flowing through a pipe, which clearly are some sort of 
surface integral. In a physics course, for example, you may have learned that 
the electric flux through a closed surface is proportional to the electric charge 
inside that surface. 

A first thing to realize is that you can’t just consider a surface S as a subset of 
K 3 and integrate a function in JR 3 over S. The surface S has three-dimensional 
volume 0, so such an integral will certainly vanish. Instead, we need to rethink 
the whole process of integration. 

At heart, integration is always the same: 

Break up the domain into little pieces, assign a little number to each little 
piece, and finally add together all the numbers. Then break the domain 
into littler pieces and repeat, taking the limi t as the decomposition be- 
comes infinitely fine. The integrand is the thing that assigns the number 
to the little piece of the domain. 

In this chapter we will show how to compute things like arc length (already 
discussed in Section 3.8), surface area and higher dimensional analogs, including 
fractals. We will be integrating expressions like /(x)|d*x| over ^-dimensional 
manifolds, where |d*x| assigns to a ^-dimensional manifold its area. Later, in 
Chapter 6, we will study a different kind of integrand, which assigns numbers 
to oriented manifolds. 

What does “little piece” mean? 

The words “little piece” in the heuristic description above needs to be pinned 
down to something more precise before we can do anything useful. There is quite 
a bit of leeway here; choosing a decomposition of a surface into little pieces is 
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analogous to choosing a paving, and as we saw in Section 4.7, there were many 
possible choices besides the dyadic paving. We will choose to approximate 
curves, surfaces, and more general fc-manifolds by ^-parallelograms. These are 
described, and their volumes computed, in the next section. 

We can only integrate over parametrized domains, and if we stick with the 
definition of parametrizations introduced in Chapter 3, we will not be able to 
parametrize even such simple objects as the circle and the sphere. Fortunately, 
for the purposes of integration, a looser definition of parametrization will suffice; 
we discuss this in Section 5.2. 

5.1 Parallelograms and their Volumes 

We specify a ^-parallelogram in R n by the point x where it is anchored, and 
the k vectors which span it. More precisely: 

Definition 5.1.1 ( A> parallelogram in R n ). A ^-parallelogram in R n is 
the subset of R n 

Px(*u • ■ • i^fc) = {x + £i\?i + • • • + I 0 < ti, . . . , tfc < 1 } , 

where x € K n is a point and ?i, . . . ,V k are k vectors. The comer x is part 
of the data, but the order in which the vectors are listed is not. 

For example, 

(1) ^x(v) is the line segment joining xtox + v. 

(2) P «(vi, v 2 ) is the (ordinary) parallelogram with its four vertices at x, x-f 
?i,x + v 2 ,x + Vi + v 2 . 

(3) P x (vi, v 2 , v 3 ) is the (ordinary) parallelepiped with its eight vertices at 
x, x -I- Vj , x + v 2 , x + v 3 , x + Vi + ? 2 , 

X + Vj+V^, X + tf 2 + V3, X + Vi + V2 + V3. 

The volume of fc-parallelograms 

Clearly the ^-dimensional volume of a ^-parallelogram P x (vi, . . . , v fc ) does not 
depend on the position of x in R n . But it isn’t obvious how to compute this 
volume. Already the area of a parallelogram in IR 3 is the length of the cross 
product of the two vectors spanning it (Proposition 1.4.19), and the formula is 
quite messy. How will we compute the area of a parallelogram in R 4 , where the 
cross product does not exist, never mind a 3-parallelogram in R 5 ? 

It comes as a nice surprise that there is a very pretty formula that covers all 
cases. The following proposition, which seems so innocent, is the key. 


Notice that if the 3-parallelo- 
gram in (3) is in R 2 , it must be 
squashed flat, and it can perfectly 
well be squashed flat even if n > 2: 
this will happen if vi,v 2 ,tf 3 are 
linearly dependent. 


It follows from Definition 5.1.1 
that the order in which we take the 
vectors doesn’t matter: 

Px(vi, v 2 ) = P x (v 2 , v,), 
Fx(*i,? 2 ,v 3 ) = Px(vi, v 3 , v 2 ), 
and so on. 
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Proposition 5.1.2 (Volume of a ^-parallelogram in R fc ). Let *i , . . . , ** 
be k vectors in R k , so that T = [v, ; . . . ,1 k ] is a square k x k matrix . Then 

voljt P(vi , . . . vjt) = yj det(T T T). 5.1.1 


In the proof of Proposition 5.1.2 
we use Theorems 4.8.7 and 4.8.10: 


Proof. We have 

y/det (TT) = V(detT)(detT) = v/(detT) 2 = |detT|. □ 


5.1.2 


det .4 det B = det(4£) 
and (for square matrices) 
det A — det A r . 

We follow the common conven- 
tion according to which the square 
root symbol of a positive number 
denotes the positive square root: 
y/a — +a, not ±a. 

Note that one consequence of 
the proof of Theorem 5.1.2 is that 
if T is any matrix, T r T always has 
a non- negative determinant: 

det(T T T) > 0. 


Proposition 5.1.2 is obviously true, but why is it interesting? The product 
T r T works out to be 

T 




• 

• 

• • ■ 

• 

• 

« 



Vl 

v 2 

• 

• 

Vfc 

• 

• 

■... 7 " 

• • • V 2 • • • 


m • 

|V.I 2 

Vl v 2 

• • ■ 

• 

Vl • v 2 ... 
|v 2 | 2 . . . 

• 

Vl •** 
V 2 • Vfc 

••• ••• ••• 

v T 

m m • • ▼ • • • m 

' s 

• 

• 

• 

-V, • v fc 

♦ • 

• • 

. 

Vl • Vfc ... 

• • • 

|v „| 2 

T T 



TT 



Recall (Proposition (1.4.3) that 

H 9 — |x| 1^1 COS Of, 

where a is the angle between x! and 


(We follow the format for matrix multiplication introduced in Example 1.2.3.) 
The point of this is that the entries of T J T are all dot products of the vectors 
Vj, In particular, they are computable from the lengths of the vectors vj , . . . , v* 
and angles between these vectors; no further information about the vectors is 
needed. 


Example 5.1.3 (Computing the volume of parallelograms in IR 2 and 

IR 3 ). When k = 2, we have 


det(T r T) = det 


’ l*i I 2 
*1 **2 


Vi • V 2 

1*2 J 2 . 


= |*l| 2 |V2 | 2 - (*1 • v 2 ) 2 . 


5.1.4 


If you write Vi • v 2 = |vi||tf 2 | cos0, this becomes 

det(T r T) = |*i| 2 |v 2 | 2 (1 - cos 2 0) = |*i| 2 |v 2 | 2 sin 2 0, 5.1.5 

so that the area of the parallelogram spanned by Vi , v 2 is 

vol 2 P(v !,? 2 ) = y / det(T T T) = |vi||v 2 || sin0|. 5.1.6 

Of course, this should come as no surprise; we got the same thing in Equation 
1.4.35. But exactly the same computation in the case n = 3 leads to a much less 



This would not be easy to com- 
pute directly from det T. since we 
don’t actually know the vectors. 


Exercise 5.1.3 asks you to show 
that if vi,v* are linearly depen- 
dent, vol*(P(vi,...,v*)) - 0. In 
particular, this shows that if k > 
n, volfc(P(vj v fc )) =0 
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familiar formula. Suppose T = (vi, v 2 . v 3 ], and let us call 0 { the angle between 
v 2 and v 3 , 0 2 the angle between vi and v 3 , and 0 3 the angle between v, and 


v 2 . Then 


T t T = 


’ iv, r’ 

Vi • v 2 
V, • v 3 


V] • v 2 

|v 2 | 2 
v 2 • v 3 


V] • v 3 

v 2 • v 3 
|V 3 | 2 


5.1.7 


and det T r T is given by 

|v l | 2 |v 2 | 2 |v 3 | 2 + 2(vi • v 2 )(v 2 • v 3 )(vi • v 3 ) 51,8 

- jv j | 2 (v 2 • tf 3 ) 2 - |v 2 | 2 (vi • v 3 ) 2 - |v 3 | 2 (vi • V 2 ) 2 
= jvi j 2 |v 2 | 2 |v 3 ( 2 (l + 2cOS#i COS ^2 COS ^3 — (cos 2 #1 + COS 2 02 4- cos 2 0 3 )). 

For instance, the volume of a parallelepiped spanned by three unit vectors, 
each making an angle of 7r/4 with the others, is 


1 -j- 2 cos 3 7 — 3 cos 2 — 
4 4 



5.1.9 


Thus we have a formula for the volume of a parallelogram that depends only 
on the lengths and angles of the vectors that span it; we do not need to know 
what or where the vectors actually are. In particular, this formula is just as 
good for a A;- parallelogram in any K”, even (and especially) if n > k. 

This leads to the following theorem. 


Theorem 5.1.4 (Volume of a ^-parallelogram in IR”). Let v lt . . . , v fc be 
k vectors in lR n , and T be the nx k matrix with these vectors as its coiumns: 
T — [vi,...,v*]. Then the k-dimension&l volume of P x (v], , . . . , v*) is 

volfc P x (\i , • . • , Vfc) = \/det ( T J T ). 5.1.10 


Proof. If we compute T t T, we find 


T t T = 

|v,| 2 

Vi • v 2 

Vi • V 2 

|v 2 | 2 

• • • V, • v„ ■ 

... V 2 • v n 


-Vi • V„ 

Vi • v„ 

... |v „| 2 . 


5.1.11 


which is precisely our formula for the h-dimensional volume of a fc-parallelogram 
in terms of lengths and angles. □ 


Example 5.1.5 (Volume of a 3-parallelogram in IR 4 ). What is the volume 


of the 3-parallelogram in IR 4 spanned by Vi 


“ 1 - 


■o* 


’0- 

0 

,V 2 = 

1 

,V 3 = 

0 

0 

0 

1 

.1. 


. 1 . 


.1. 
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Set T = (vj. v 2 , v 3 ]; then 

r T r = 

so the volume is 2. A 


2 1 1 
1 2 1 
1 1 2 


and det(T r T) = 4 


5.2 Parametrizations 


In this section we are going to relax our definition of a parametrization. In 
Chapter 3 (Definition 3.2.5) we said that a parametrization of a manifold M is 
a C 1 mapping <p from an open subset U C M n to M, which is one to one and 
onto, and whose derivative is also one to one. 

The problem with this definition is that most manifolds do not admit a 
parametrization. Even the circle does not; neither does the sphere, nor the 
torus. On the other hand, our entire theory of integration over manifolds is 
going to depend on parametrizations, and we cannot simply give up on most 
examples. 

Let us examine what goes wrong for the circle and the sphere. The most 
obvious parametrization of the circle is 7 : t i-» ( ^ \ ) • The problem is 

choosing a domain: If we choose (0, 27 r), then 7 is not onto. If we choose [0, 27rJ, 
the domain is not open, and 7 is not one to one. If we choose (0, 2 tt), the domain 
is not open. 

For the sphere, spherical coordinates 



5.2.1 


present the same sort of problem. If we use as domain (-7r/2,7r/2) x (0,27t), 
then 7 is not onto; if we use [-7r/2,7r/2] x [0,27r], then the map is not one to 
one, and the derivative is not one to one at points where <p = ±7r/2, .... 

The key point for both these examples is that the tivuble occurs on sets 
of volume 0, and therefore it should not matter when we integrate. Our new 
definition of a parametrization will be exactly the old one, except that we allow 
things to go wrong on sets of ^-dimensional volume 0 when parametrizing k- 
dimensional manifolds. 


Sets of ^-dimensional volume 0 in R n 

Let A be a subset of R n . We need to know when X is negligible as far as 
A;-dimensional integrals are concerned. Intuitively it should be fairly clear what 
this means: points are negligible for 1-dimensional integrals or higher, points 
and curves are negligible for 2-dimensional integrals, etc. 
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The cubes in Equation 5.2.2 
have side length 1/2^. We are 
summing over cubes that intersect 
X. 

Proposition A 19.1 in the Ap- 
pendix explains why this is a rea- 
sonable definition: if X has k- 
dimensional volume 0, then its 
projection onto k coordinates has 
^-dimensional volume 0. 

Typically, U will be closed, and 
X will be its boundary. But there 
are many cases where thus isn’t 
quite the case, including many 
which come up in practice like 
5.2.3 below, where it is desirable 
to allow X to be larger. 

The mapping 'y is a strict pa- 
rametrization on U - AT, i.e., on U 
minus the boundary and any other 
trouble spots of fc-dimensional vol- 
ume 0. 



Figure 5.2.1. 


The subset of R 3 of equation 
X 2 + y 2 — z 2 is not a manifold at 
the vertex. 


It is possible to define the ^-dimensional volume of an arbitrary subset X C 
K n , and we will touch on this in Section 5.6 on fractals. That definition is quite 
elaborate; it is considerably simpler to say when such a subset has fc-dimensional 
volume 0 . 


Definition 5.2.1 (fc-dimension&l volume 0 of a subset of R n ). 
A bounded subset of R n has ^-dimensional volume 0 if 

k 


lim 

N—*oo 


£ (?0 -»■ 


5.2.2 


ceiM**) 

cnx*<£ 


An arbitrary subset X has ^-dimensional volume 0 if for all R, the bounded 
set X n Br(0) has ^-dimensional volume 0. 


New definition of parametrization 

From now on when we use the word “parametrization” we will mean the fol- 
lowing; if we wish to refer to the more demanding definition of Chapter 3, we 
will call it a “strict parametrization.” 

Definition 5.2.2 (Parametrization of a manifold). Let Af c JR n be 
a ^-dimensional manifold and U be a subset of R* with boundary of Ik- 
dimensional volume 0; let X C U have ^-dimensional volume 0, and let 
U — X be open. Then a continuous mapping 7 : U —♦> R n parametrizes Af if 

(1) 7 (U) D Af; 

(2) 7 (U -X)c Af; 

(3) 7 (V - X) M is one to one, of class C x , with locally Lipschitz 
derivative; 

(4) the derivative [i>y(u)] is one to one for all u in U - X; 

(5) y(X) has fc-dimensional volume 0. 


Often condition ( 1 ) will be an equality; for example, if Af is a sphere and U 
a closed rectangle mapped to Af by spherical coordinates, then 7 (17) = M. In 
that case, X is the boundary of U , and y(X ) consists of the poles and half a 
great circle (the international date line, for example), giving 7 (U - X) c Af for 
condition ( 2 ). 


Example 5.2.3 (Parametrization of a cone). The subset of R 3 of equation 
x + y ~ z ’ shown in Figure 5.2.1, is not a manifold in the neighborhood of 
the vertex, which is at the origin. However, the subset 


Af = 



x 2 -f y 2 - z 2 = 0 , 0 < z < 1 > 


5.2.3 
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Figure 5.2.2. 

The surface of equation x 2 + 
y 3 4 z 5 = 1. Top: The surface 
seen as a graph of x as a function 
of y and z (i.e., parametrized by y 
and z). The graph consists of two 
pieces: the positive square root 
and the negative square root. 
Middle: Parametrizing by x and 
z. Bottom: Parametrizing by x 
and y . Note in the bottom two 
graphs that the lines are drawn 
differently: different parametriza- 
tions give different resolutions to 
different areas. 


is a manifold. Consider the map 7 : (0, 1] x [0, 2ir] — » IR 3 given by 


‘ft) 



5.2.4 


If we let U - [0, 1] x [0, 27 t], and X - dU> then 7 is a parametrization of M. 
Indeed, 7([0, 1) x [0, 2n}) D M (it contains the vertex and the circle of radius 1 
in the plane z = 1, in addition to M), and 7 does map (0, 1) x (0, 2 tt) into M 
(this time, it omits the line segment x = z,y = 0). The map is one to one on 
(0, 1) x (0, 2 tt), and so is its derivative. A 

A small catalog of parametrizations 

As we will see below, essentially all manifolds admit parametrizations with the 
new definition. But it is one thing to construct such a parametrization using the 
implicit function theorem, and another to write down a parametrization explic- 
itly. Below we give a few examples, which frequently show up in applications 
and exam problems. 

Graphs. If U is an open subset of R* with boundary dU of A:-dimensional 
volume 0, and f : U —* R n “* is a C 1 mapping, then the graph of f is a manifold 
in R n , and the map 


x 

f(x) 


5.2.5 


is a parametrization. 

There are many cases where the idea of parametrizing as a graph still works, 
even though the conditions above are not satisfied: those where you can “solve” 
the defining equation for n — k of the variables in terms of the other k. 

Example 5.2.4 (Parametrizing as a graph). Consider the surface in R 3 
of equation x 2 + y 3 4- z 5 = 1. In this case you can “solve” for x as a function 
of y and z: 


x = =fc\/l — y 3 — 


5.2.6 


You could also solve for y or for z, as a function of the other variables, and 
the three approaches give different views of the surface, as shown in Figure 
5.2.2. Of course, before you can call any of these a parametrization, you have 
to specify exactly what the domain is. When the equation is solved for x, the 
domain is the subset of the (y, z)-plane where 1 - y 3 - z 5 > 0. When solving 
for y, remember that every number has a unique cube root, so the function 
y = (l - x 2 - z 5 ) 1 is defined at every point, but it is not differentiable when 
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Figure 5.2.3. 

The surface obtained by rotat- 
ing the curve of equation 

(1 - x) 3 = z 2 

around the 2-axis. The surface 
drawn corresponds to the region 
where \z\ < 1, rotated only three 
quarters of a full turn. 


x 2 + z 5 = 1 , so this curve must be included in the set X of trouble points that 
can be ignored (using the notation of Definition 5.2.2.) A 

Surfaces of revolution. The graph of a function /(x) is the curve C of 
equation y = /(x). Let us suppose that / takes only positive values, and rotate 
C around the x-axis, to get the surface of revolution of equation 

y 2 + z 2 = (/(*)) 2 . 5.2.7 


This surface can be parametrized by 


7 : ( 5 ) *-» [ f(x) cos$ | . 5.2.8 

v * 7 \ /(x) sin 0 ) 

Again, to be precise one must specify the domain of 7. Suppose that / : 
(a, b) —► R is defined and continuously differentiable on (a, b). Then the domain 
of 7 is (a, 6) x [0, 2?r], and 7 is one to one, with derivative also one to one on 
(a, b) x (0, 2?r). 

If C is a parametrized curve, (not necessarily a graph), say parametrized by 
1 1-> MO), the surface obtained by rotating C can still be parametrized by 



5.2.9 


Spherical coordinates on the sphere of radius R are a special case of this 
construction: If C is the semi-circle of radius R in the (x, z)-plane, parametrized 
by 


= ~*/2<*<*/2, 5.2.10 

then the surface obtained by rotating this circle is precisely the sphere of radius 
R centered at the origin in R 3 , parametrized by 



the parametrization of the sphere by latitude and longitude. 


Example 5.2.5 (Surface obtained by rotating a curve). Consider the 
surface obtained by rotating the curve of equation (1 - x) 3 = z 2 in the (x, z)- 

plane around the z-axis. This surface has the equation ^1 - y/x 2 + y 2 \ = z 2 . 
The curve can be parametrized by 

t ~ (f)l -t 2 z = t 3 , 5.2.12 

so the surface can be parametrized by 
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The exercises contain further 
examples of parametrizations. 



5.2.13 


Figure 5.2.3 represents the image of the parametrization |f| < 1» 0 < B < 
3?r /2. It can be guessed from the picture (and proved from the formula) that the 
subset of [-1, 1] x [0, 3 tt/ 2] where t = ±1 are trouble points (they correspond to 
the top and bottom “cone points”), and so is the subset {0} x [0, 3 tt/ 2], which 
corresponds to a “curve of cusps.” A 


The existence of parametrizations 

Since our entire theory of integrals will be based on parametrizations, it would 
be nice to know that manifolds, or at least some fairly large class of manifolds, 
actually can be parametrized. 


Exercise 5.2.6 explores a mani- 
fold that cannot be parametrized. 
But such manifolds are patholog- 
ical; you won’t run into them un- 
less you go out of your way to look 
for them. We do not believe that 
there is a reasonable way to in- 
tegrate over such manifolds using 
Riemann integrals. 


Remark. There is here some ambiguity as to what “actually” means. In the 
above examples, we came up with a formula for the parametrizing map, and 
that is what you would always like, especially if you want to evaluate an integral. 
Unfortunately, when a manifold is given by equations (the usual situation), it 
is usually impossible to find formulas for parametrizations; the parametrizing 
mappings only exist in the sense that the implicit function theorem guarantees 
their existence. If you really want to know the value of the mapping at a 
point, you will need to solve a system of nonlinear equations, presumably using 
Newton’s method; you will not be able to find a formula. A 

Theorem 5.2.6 (What manifolds can be parametrized). Let M C 
R n be a manifold , such that there are finitely many open subsets U{ C 
M covering M, corresponding subsets V* C R* all with boundaries of k- 
dimenskmal volume 0, and continuous mappings 74 : Vi — * which a re one 
to one on K, with derivatives which are also one to one. Then M can be 
parametrized. 


It is rather hard to think of any manifold that does not satisfy the hypotheses 
of the theorem, hence be parametrized. Any compact manifold satisfies the 
hypotheses, as does any open subset of a manifold with compact closure. We 
will assume that our manifolds can all be parametrized. The proof of this 
theorem is technical and not very interesting; we do not give it in this book. 


Change of parametrization 

Our theory of integration over manifolds will be set up in terms of parametrizar 
tions, but of course we want the quantities computed (arc length, surface area, 
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Recall that Theorem 4.11.16, 
the change of variables for im- 
proper integrals, says nothing 
about the behavior of the change 
of variables map on the bound- 
ary of its domain. This is impor- 
tant since often, as shown in Ex- 
ample 5.2.7, the mapping is not 
defined on the boundary. If we 
had only our earlier version of the 
change of variables formula (Theo- 
rem 4.10.12), we would not be able 
to use it to justify our claim that 
integration does not depend on the 
choice of parametrization. 



Figure 5.2.4. 


The black dot in the left rec- 
tangle is mapped by h to a point 
in the hemisphere that is a pole 
for the parametrization 72 ; the in- 
verse mapping 7 -f 1 then maps that 
point to an entire line segment. 
To avoid this kind of problem, we 
must define our change of vari- 
ables mapping carefully. 


fluxes of vector fields, etc.), to depend only on the manifold and the integrand, 
not the chosen parametrization. In the next three sections we show that the 
length of a curve, the area of a surface and, more generally, the volume of a 
manifold, are independent of the parametrization used in computing the length, 
area or volume. In all three cases, the tool we use is the change of variables for- 
mula for improper integrals, Theorem 4.11.16: we set up a change of variables 
mapping and apply the change of variables formula to it. We need to justify 
this procedure, by showing that our change of variables mapping is something 
to which the change of variables formula can be applied: i.e., that satisfies the 
hypotheses of Theorem 4.11.16. 

Suppose we have a ^-dimensional manifold M and two parametrizations 

7i : Ui -► M and 72 : U 2 -+ Af, 5.2.14 

where U\ and U 2 are subsets of R*. Our candidate for the change of variables 
mapping is $ = 7J 1 o 71, i.e., 

U 2 . 5.2.15 

7l 7a* 1 

But this mapping can have serious difficulties, as shown by the following exam- 
ple. 

Example 5.2.7 (Problems when changing parametrizations). Let 71 
and 72 be two parametrizations of S 2 by spherical coordinates, but with differ- 
ent poles. Call P\y P[ the poles for 71 and P 2i P£ the poles for 72. Then 7J 1 071 
is not defined at 7f 1 ({P2,-Pj}). Indeed, some one point in the domain of 71 
maps to P 2 . 1 But as shown in Figure 5.2.4, 72 maps a whole segment to P 2y so 
that 72 1 0 7i maps a point to a line segment, which is nonsense. The only way 
to deal with this is to remove 7f 1 ({^2, *2}) from the domain of $, and hope 
that the boundary still has fc-volume 0. In this case this is no problem: we just 
removed two points from the domain, and two points certainly have area 0. 

Definition 5.2.2 of a parametrization was carefully calculated to make the 
analogous statement true in general. 

Let us set up our change of variables with a bit more precision. Let Ui and 
U 2 be subsets of R*. Following the notation of Definition 5.2.2, denote by X x 
the negligible “trouble spots” of 71, and by X 2 the trouble spots of 72. In 
Example 5.2.7, X\ and X 2 consist of the points that are mapped to the poles 
(i.e., the lines marked in bold in Figure 5.2.4). 

J If P 2 happens to be on the date line with respect to 7 ,, two points map to 
P 2 \ in Figure 5.2.4, a point on the right-hand boundary of the rectangle, and the 
corresponding point on the left-hand boundary. 
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Theorem 4.11.16 (change of 
variables for improper integrals): 
Let V and V be open subsets of 
M n whose boundaries have volume 
0, and 4> : U — V a C l diffeo- 
morphism, with locally Lipschitz 
derivative. If / : V — ► 1R is 
an integrable function, then (/ o 
<p)\ det[D4>)| is also integrable, and 

f f(v)\(Tv\ = 

Jv 

f (/ o 4>)(u)| det(D4>(u)] |<Tu|. 

Jv 


Call 

K, = (7 2 “‘°7i)(*i), and Y t = f-yf* o 72X^2). 5.2.16 

In Figure 5.2.4, the dark dot in the rectangle at left is Y 2 , which is mapped by 
71 to a pole of 72 and then by 7^ 1 to the dark line at right; Y\ is the (unmarked) 
dot in the right rectangle that maps to the pole of 71 . 

Set 

[/,»* = U x - (X, U K 2 ) and Uf = U 2 - (X 2 U Yi); 5.2.17 

i.e., we use the superscript “ok” to denote the domain or range of a change of 
mapping with any trouble spots of volume 0 removed. 

Theorem 5.2.8. Both U° k and U 2 k are open subsets ofU k with boundaries 
of k-dimensionaJ volume 0, and 


rok 


9 : u;* - U? = 7 2 -‘ 0 7, 
is a C 1 diSeomorphism with locally Lipschitz inverse. 


Theorem 5.2.8 says that $ is something to which the change of variables 
formula applies. It is proved in Appendix A. 19. 


5.3 Arc Length 


Sometimes the element of arc 
length is denoted dl or ds. 

Note that 

|v| = \j det^? T v), 

so that Equation 5.3.1 is a special 
case of Equation 5.1.10. 


The integrand l^x), called the element of arc length , is an integrand to be 
integrated over curves. As such, it should take a 1-parallelogram P x (v) in M n 
(i.e., a line segment) and return a number, and that is what it does: 

|d 1 x|(P x (v)) = |v|. 5.3.1 

More generally, if / is a function on R n , then the integrand f\d l x\ is defined 
by the formula 


Archimedes (287-212 BC) used 
this process to prove that 

223/71 < 7T < 22/7. 

In his famous paper The Measure- 
ment of the Circle , he approxi- 
mated the circle by an inscribed 
and a circumscribed 96-sided reg- 
ular polygon. That was the begin- 
ning of integral calculus. 


/ld‘xl(P x (v) = /(x)|v|. 
If C c is a smooth curve, the integral 


5.3.2 



5.3.3 


is the number obtained by the following process: approximate C by little line 
segments as in Figure 5.3.1, apply |cf 1 x| to each to get its length, and add. 

Then let the approximation become infinitely fine; the limit is by definition the 
length of C. 

In Section 3.8, we carried out this computation when I is the interval [a, 6], 

and C C R 3 is a smooth curve parametrized by 7 : I -► R 3 , and showed that 
the limit is given by 


f t l dl *l^r«)(7'W)l dt ~ f IY(«)I \dt\. 


5.3.4 
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■ ^ / 



Figure 5.3.1. 

A curve approximated by an 
inscribed polygon, shown already 
in Section 3.8. 


In particular, the integral 


/ 


IVWI \dt\ 


5.3.5 


depends only on C and not on the parametrization. 


Example 5.3.1 (Graph of function). The graph of a C 1 function /(x), for 
a < x <b, is parametrized by 


(/(*))• 


5.3.6 


and hence its arc length is given by the integral 

>6 


j W = J + {f'(x)) 2 dx. 


5.3.7 


i: 


Because of the square root, these integrals tend to be unpleasant or impossible 
to calculate in elementary terms. The following example, already pretty hard, 
is still one of the simplest. The length of the arc of parabola y = ax 2 for 
0 < x < A is given by 

*A 

y/ 1 + 4a 2 x 2 dx. 5.3.8 

'o 

A table of integrals will tell you that 

J \/l +u 2 du= |\/u 2 + 1 + ^log|u + \/l + u 2 |. 5.3.9 

Setting 2ax = u, this leads to 

J \/l + 4a 2 x 2 dx = — paxv^l + 4a 2 x 2 + log|2ax 4- \/l + 4a 2 x 2 |j 

= ( 2aA\/l + 4 a 2 A 2 + log|2aA + >/ 1 +4a 2 A 2 |). 

5.3.10 

Moral: if you want to compute arc length, brush up on your techniques of 
integration and dust off the table of integrals. A 

Curves in R n have lengths even when n > 3, as the following example illus- 
trates. 

Example 5.3.2 (Length of a curve in ® 4 ). Let p,g be two integers, and 
consider the curve in R 4 parametrized by 


( cos pt\ 

sin qt / 


5.3.11 


Its length is given by 
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The curve of Example 5.3.2 is 
important in mechanics, as well as 
in geometry. It is contained in the 
unit sphere S 3 C R 4 , and in this 
sphere is knotted when p and q are 
coprime and both are greater than 
1 . 


y2ir __ 

/ y/(-p Sin pt) 2 4* (pcospt) 2 + (-q sin qt) 2 + (q cos qt) 2 dt 

Jo 


5.3.12 


= 2? Ty/p 2 + q 2 . A 


We can also measure data other than pure arc length, using the integral 


Integrating the length of the ve- 
locity vector, | 7 '(t)|, gives us dis- 
tance along the curve. Proposition 
5.3.3 says that if you take the same 
route (the same curve) from New 
York to Boston two times, making 
good time the first, and caught in 
a traffic jam the second, you will 
go the same distance both times. 


Jc /(x) | cf 1 x | = f f(y(t)) |Y(*)| 5 - 313 

for instance if /(x) gives the density of a wire of variable density, the integral 
above would give the mass of the wire. 

In other contexts (particularly surface area), it will be much harder to define 
the analogs of “arc length” independently of a parametrization. So here we give 
a direct proof that the arc length given by Equation 5.3.5 does not depend on 
the chosen parametrization; later we will adapt this proof to harder problems. 


Proposition 5.3.3 (Arc length independent of parametrization). 
Suppose 71 : Ji — ♦ ® 3 and 72 : — ♦ R 3 are two parametrizations of the 
same curve C € 1R 3 . Then 

/ [ &M\\dh\. 5 . 3.14 

Jh Jit 


Example 5.3.4 (Parametrizing a half-circle). We can parametrize the 
upper half of the unit circle by 


In Equation 5.3.16 we write 
rat her than because we 
are not concerned with orientar 
tion: it doesn’t matter whether we 
go from -1 to 1, or from 1 to -1. 
For the same reason we write [dxj 
not dx. 


° rby t>- * (sint)’°- t - ,r - 53 - 15 
In both cases we get length n. With the first parametrization we get 

/ 

J[- 1.1 


1 

~x 


y/l-x 2 J 


5.3.16 


|dx| = /_,,„ ldx] 

= }dx] = l “ csini)Li = \ ~ K) - *• 


The second gives 
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This map 4> goes from an open 
subset of /i to an open subset of 
/ 2 , with both boundaries of one- 
dimensional volume 0. In fact, the 
boundaries consist, of finitely many 
points. This is an easy case of the 
harder general case discussed in 
Section 5.2: defining 4> so that the 
change of variables formula applies 
to it. 


Recall the change of variables 
formula: 

/ /(v)|<Tv| = 

Jv 

[ (/ ° 3>)(u)| det[D4>(u)J |d n u|. 
Ju 

In Equation 5.3.23, |t£| plays 
the role of /. Since [D4>(£i)] is a 
number, 

det[D*(*i)] = [D*(ti)J. 

The second equality is Equation 
5.3.22, and the third is Equation 
5.3.21. 


Proof of Proposition 5.3.3. Denote by $ — 7 2 1 0 7i : I° k ^2 
“change of parameters map” such that $(£1) = ^2* This map $ goes from an 
open subset of I x to an open subset of I 2 by way of the curve; 71 takes a point 
of 1 1 to the curve, and then 72 1 takes it “backwards” from the curve to / 2 . 
Substituting 7 2 for f, $ for g and T\ for a in Equation 1.8.12 of the chain 

rule gives 

[D(72 o *)(*i)] = [D72(*(*i))) o [M(ti)]. 5.3.18 

Since 

* 

72 0$ = 72°7 2 ’ 1 °7 i = 7i> 5.3.19 

we can substitute 71 for (72 o $) in Equation 5.3.18 to get 


[P 7 i(ti)l = [P72 [D*(*i)l • 5.3.20 

'rlOO ^(* 2 ) (ir 2 -1 o7i)' 


Note that the matrices 

(D7i(<i)] = 7l(ti) and (D 72 W 1 ))) = ^(*(*0) = t](« 2 ) 5.3.21 

are really column vectors (they go from R to R 3 ) and that [D$(£i)], which goes 
from R to M, is a 1 x 1 matrix, i.e., really a number. So when we take absolute 
values, Proposition 1.4.11 gives an equality, not an inequality: 

|[D 7 i (*i)]| = |(Dn»(*(*,))]| ||D*(*i)]|. 5.3.22 

We now apply the change of variables formula (Theorem 4.10.12), to get 

det[D*(t,)l 

f m 2 )\\dt 2 \= [ |[D(7 2 o$)(t,)j||[D*(tO]||dt 1 | 

Jh Jh 5.3.23 

= f l(D 7 ,(t,)]| |dt|| = / |7](ti)l|dti|. O 
Jh Jh 


5.4 Surface Area 


Exercise 5.4.1 asks you to show 
that Equation 5.4.1 is another way 
of stating Equation 5.4.2. 

The “element of area," which 
we denote |d 2 x|, is often denoted 
dA. 


The integrand \d?x\ takes a parallelogram P x (v i,v 2 ) and returns its area. In 
R 3 , this means 


l^ 2 x| (P x (v i » v 2 )) = |V! X v 2 |; 5.4.1 

the general formula, which works in all dimensions, and which is a special case 
of Theorem 5.1.4, is 



!0 d x(vi,v 2 )) = yj det^[vi, v 2 ] t [vi, v 2 ]^ . 


5.4.2 
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We speak of triangles rather 
than parallelograms for the same 
reason that you would want a 
three-legged stool rather than a 
chair if your floor were uneven. 
You can make all three vertices of 
a triangle touch a curved surface, 
which you can’t do for the four 
vertices of a parallelogram. 


To integrate |d 2 x| over a surface, we wish to break up the surface into little 
parallelograms, add their areas, and take the limit as the decomposition be- 
comes fine. Similarly, to integrate f\dPx\ over a surface, we go through the same 
process, except that instead of adding areas we add /(x) x area of parallelogram. 

But while it is quite easy to define arc length as the limit of the length of 
inscribed polygons, and not much harder to prove that Equation 5.3.5 computes 
it, it is much harder to define surface area. In particular, the obvious idea of 
taking the limit of the area of inscribed triangles as the triangles become smaller 
and smaller only works if we are careful to prevent the triangles from becoming 
skinny as they get small, and then it isn’t obvious that such inscribed polyhedra 
exist at all (see Exercise 5.4.13). The difficulties are not insurmountable, but 
they are daunting. 

Instead, we will use Equation 5.4.3 as our definition of surface area. Since 
this depends on a parametrization, Proposition 5.4.4, the analog of Proposition 
5.3.3, becomes not a luxury but an essential step in making surface area well 
defined. 


Definition 5*4.1 (Surface area). Let S c l 3 be a smooth surface 
parametrized by 7 : U — ♦ S, where U is an open subset of M 2 . Then the 
area of S is 

^ |d 2 x|(P 7 ( u) (Di7(u),D 2 7(u)))|d 2 u|=: J ^det([D7(u)] T [D7(u)])|d 2 u|. 

5.4.3 


Let us see why this ought to be right. The area should be 

lim Y Area of 7 (COU). 5.4.4 

N—+oc ' 

C€P/v(® a ) 



That is, we make a dyadic decomposition of K 2 and see how 7 maps to S 
the dyadic squares C that are in U or straddle it. We then sum the areas of 
7 (C fl U), which, for C C U, is the same as 7 (C); for C that straddle C7, we 
add to the sum the area of the part of C that is in U. 

The side length of a square C is 1/2^, so at least when C C U , the set 
7 (C fl U ) is, as shown in Figure 5.4.1, approximately the parallelogram 

^(u) (2^17(11), ^D 2 7(u)), 5.4.5 


Figure 5.4.1. 

A surface approximated by par- 
allelograms. The point xo corre- 
sponds to 7(u), and the vectors vj 
and v 2 correspond to the vectors 

jyA?(u) and ilVKu). 


where u is the lower left hand corner of C. 

That parallelogram has area 

2 577 \j det[D7(u)] T (D7 (u)]. 

So it seems reasonable to expect that the error we make by replacing 
Area of 7 (C fl U) by vol 2 (C) y^det[D7(u)] T [D7(u)J 


5.4.6 


5.4.7 
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Figure 5.4.2. 

The torus with the u and v co- 
ordinates drawn. You should ima- 
gine the straight lines as curved. 
By “torus” we mean the surface 
of the object. The solid object is 
called a solid torus. 


We write \dudv\ in Equation 
5.4.10 to avoid having to put sub- 
scripts on our variables; we could 
have used Ui and u 2 rather than u 
and v, and then used the integrand 
\d 2 u\. 

In the final, double integral, we 
are integrating over an oriented 
interval, from 0 to 2tt , so we write 
dudv rather than \du dv\. 

Note that the answer has the 
right units: r and R have units of 
length so 4n 2 rR has units length 
squared. 

In Example 5.4.2 the square 
root that inevitably shows up 
(since we are computing the length 
of a vector) was simply y/l. It is 
exceptional that the square root of 
any function can be integrated in 
elementary terms. Example 5.4.3 
is more typical. 


will disappear in the limit as N -* oc. And the area given by Equation 5.4.7 is 
precisely a Riemann sum for the integral giving surface area: 


^lim Y vol 2 C yj det[D7(u)) T [D7 (u)] = J y^det[D7(u)] T [D7(u)] |d 2 u 

^ OO ^ /B.O V J 


CglMR 2 ) 


area of surface by Eq. 5.4.3 


5.4.8 


Unfortunately, this argument isn’t entirely convincing. The parallelograms 
above can be imagined as some sort of tiling of the surface, gluing small flat 
tiles at the corners of a grid drawn on the surface, a bit like using ceramic 
tiles to cover a curved counter. It is true that we get a better and better fit 
by choosing smaller and smaller tiles, but is it good enough? Our definition 
involves a parametrization 7; only when we have shown that surface area is 
independent of parametrization can we be sure that Definition 5.4.1 is correct. 
We will verify this after computing a couple of examples of surface integrals. 


Example 5.4.2 (Area of a torus). Choose R > r > 0. We obtain the torus 
shown in Figure 5.4.2 by taking the circle of radius r in the (x,z)-plane that is 
centered at x — R, z — 0, and rotating it around the z-axis. 

This surface is parametrized by 

/ (R + r cos u) cos v \ 

7(^5) = I (R + rcosu)sinv ] , 5.4.9 

7 \ rsinu J 

as Exercise 5.4.2 asks you to verify. 

Then the surface area of the torus is given by the integral 


Dil 




L 


(0,2*) x (0,2*] 


/- ■ 1 "_N /• 

-r sin u cost; 

— r sin u sin v 
rcosu 


r ~(R + r cos it) sin v 
(R 4- rcosu) cosv 


0 


\dudv 


-/ 

Jic 


=1 


(0,2*jx(0,2*j 

2 7T f2lt 


r(R + rcosu) y/ (sin it) 2 + (cos u sin v ) 2 + (cos u cos v) 2 \dudv\ 


' / (R + rcosu) dudv = 47r 2 rR. A 

0 Jo 


5.4.10 


Example 5.4.3 (Surface area: a harder problem). What is the area of 
the graph of the function x 2 + y 3 above the unit square Q C R 2 ? 

Applying Equation 5.2.5, we parametrize the surface by 



and apply Equation 5.4.3: 
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Again, on the right-hand side of 
Equation 5.4.11 we drop the ab- 
solute value signs, writing dxdy , 
because we now have an oriented 
interval, from 0 to 1: Jq ■ 


Even Example 5.4.3 is compu- 
tationally nicer than is standard: 
we were able to integrate with re- 
spect to x. If we had asked for 
the area of the graph of x 3 + y 4 , 
we couldn’t have integrated in el- 
ementary terms with respect to 
either variable, and would have 
needed the computer to evaluate 
the double integral. 

And what’s wrong with that? 
Integrals exist whether or not they 
can be computed in elementary 
terms, and a fear of numerical in- 
tegrals is inappropriate in this age 
of computers. If you restrict your- 
self to surfaces whose areas can 
be computed in elementary terms, 
you are restricting yourself to a 
minute class of surfaces. 


L 


D i*r 

1 
0 

2x 


Dn 


0 

1 

3 y 2 




\dxdy\= ( [ y/l + 4x 2 + 9y 4 dxdy. 

Jo Jo 


5.4.11 


The integral with respect to x is one we can calculate (just barely in our case, 
checking our result with a table of integrals). First we get 




_ uy/v? T~a? a 2 log(w + y/u 2 + a 2 ) 

u 2 + a 2 du— 1 2 


5.4.12 


This leads to the integral 
f f y/l 4- 4x 2 -I- 9 y 4 dx dy 

Jo Jo 


= / 
=/ 


xy/Ax 2 + 1-1- Qy 4 1 + 9 y A log(2x + >/4x 2 + 1 + 9 y 4 
2 + 4 


n 1 


y/5 + V 1 + 9 y 4 , 2 + y/5 + 9 y 4 

2 4 g y/\ + 9y4 


dy. 


dy 


5.4.13 


It is hopeless to try to integrate this mess in elementary terms: the first 
term requires elliptic functions, and we don’t know of any class of special 
functions in which the second term could be expressed. But numerically, this 
is no big problem; Simpson’s method with 20 steps gives the approximation 
1.93224957.... A 


Surface area is independent of the choice of parametrization 

As shown by Exercise 5.4.13, it is quite difficult to give a rigorous definition 
of surface area that does not rely on a parametrization. In Definition 5.4.1 
we defined surface area using a parametrization; now we need to show that 
two different parametrizations of the same surface give the same area. Like 
Proposition 5.3.3 (the analogous statement for curves), this is an application of 
the change of variables formula. 

Proposition 5.4.4 (Surface area independent of parametrization). 

Let S be a smooth surface in R 3 and 71 : U — > R 3 , 72 : V — *■ R 3 be two 
parametrizations. Then 

J v/det((D7i(u)] T [D 7l (u)]) |4?u| = J v /det([D 72 (v)l T (D 72 (v)l) |d 2 v|. 

U V 5.4.14 

Proof. We begin as we did with the proof of Proposition 5.3.3. Define $ = 
7J 1 °7i : U ok — ► V ok to be the “change of parameters” map such that v = $(u). 
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Notice that the chain rule applied to the equation 71 - 72 ° 72 ° 7i - 72 ° $ 
gives 

(D(72 0$)(u)] = [D7i(u)J - [Dt 2 (*(u) 1 [D*(u)). 


5.4.15 


To go from line one to line 
two of Equation 5.4.16 we use the 
change of variables formula. To go 
from line two to line three we use 
the fact that (D4>(u)] is a square 
matrix, and that for a square ma- 
trix A , det A — det A . To go 
from line three to four first replace 
det AB by det B det A and then re- 
member that each of these dets is a 
number, so they can be multiplied 
in any order. To go from line four 
to line five we use the chain rule. 


If we apply the change of variables formula, we find 

J ydet(D 7 2 (v)]) T [D 72 (v)]) |d*v| 

= J \fdet ([D (72 o *)(u)) t (D (72 ° $)(“))) |det[D$(u))| |d 2 u| 

= J v /det([D72(*(u))] T [D72(*(u))]) \j det ((D$(u)] T [D$(u)]) |d 2 u 

= J y/det ((D4>(u)) t [D7 2 ($(u))) t [D72(4>(u))](D$(u)]) l^ul 

= J \Jdet ((D (72 ° 4>)(u)) t [D(72 ° $)(u))) It^ul 
= J \/ det[D7i (u)]) t [D 7 j(u)]) l^ul- □ 


5.4.16 


Areas of surfaces in R n , n > 3 

A surface (i.e., a two-dimensional manifold) embedded in R n should have an 
area for any n, not just for n = 3. 

A first difficulty is that it is hard to imagine such surfaces, and perhaps 
impossible to visualize them. But it isn’t particularly hard to describe them 
mathematically. 

For instance, the subset of R 4 given by the two equations x\ + x\ = rj, x\ + 
= 7 * 2 , is a surface; it corresponds to two equations in four unknowns. This 
surface is discussed in Example 5.4.5. More generally, we saw in Section 3.2 
that the set X C R n defined by the n-k equations in n variables 



defines a ^-dimensional manifold if [Df(x)] : R n — ► R m is onto for each x € X. 

Example 5.4.5 (Area of a surface in R 4 ). The surface described above, 
the subset of R 4 given by the two equations 

x* + x\ — rj and x\ + x\ — r|, 


5.4.18 
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is parametrized by 


7 



( r*i cosu 
ri sin u 

7*2 COS V 

t*i sinv 


,0 < u, v < 27 r, 


5.4.19 


and since 

HC))]>((:))] 



‘-7*1 sin u 0 

7*i sin u ricosu 0 0 


7*1 cos u 0 

0 0 — 7*2 sin V 7* 2 cos V 


0 -7*2 sin v 


0 7*2 COS V . 


r? 0 
0 r\ 


5.4.20 


Equation 5.4.3 tells us that its area is given by 


i„ W o J ie iHv)} T Hv)]) 1**1 

= / y] T \ r \ \dudv\ = 47 r2 rir 2 . A 

J f0.2irl x f0,2*rl V 


5.4.21 


Another class of surfaces in R 4 which is important in many applications, and 
which leads to remarkably simpler computations that one might expect, uses 
complex variables. Consider for instance the graph of the function f(z) = z 2 , 
where z is complex. This graph has the equation z? = z\ in C 2 , or 


X 2 = x\ - yf, jfc = 2xiyi, in R 4 . 5.4.22 

Equation 5.4.3 tells us how to compute the areas of such surfaces, if we manage 
to parametrize them. If S c R n is a surface, U c R 2 is an open subset, and 
7 : U -*■ R n is is a parametrization of S , then the area of S is given by 

Is ~ Ju 5,4,23 


Example 5.4.6 (Area of a surface in C 2 ). Let us tackle the surface in C 2 
of Equation 5.4.22. More precisely, let us compute the area of the part of the 
surface of equation z 2 = zf, where |zi| < 1. Polar coordinates for z\ give a nice 
way to parametrize the surface: 


( 7*0060 \ 
7*sin0 
r 2 cos 20 
t* 2 sin 20 / 


, 0 < t* < 1, 0 < 0 < 2tt. 


5.4.24 
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Again we need to compute the area of the parallelogram spanned by the two 
partial derivatives. Since 


Notice (last line of Equation 
5.4.26) that the determinant ends 
up being a perfect square; the 
square root that created such trou- 
ble in Example 5.3.1, which deals 
with the real curve of equation 
y = x 2 causes none for the com- 
plex surface with the same equa- 
tion Z 2 = Zi- 

This “miracle” happens for all 
manifolds in C n given by “com- 
plex equations,” such as polyno- 
mials in complex variables. Sev- 
eral examples are presented in the 
exercises. 


M*)] T H*)] 

_ T cos# sin# 
“ [ -rsin# rcos# 


2 r cos 2# 
-2 r 2 sin 29 


2 r sin 29 
2 r 2 cos 29 


■ cos 9 -rsin# 

sin 9 r cos 9 

2rcos2# -2r 2 sin2# 

. 2rsin# 2r 2 cos2#. 


1 + 4r 2 0 

0 r 2 (l+4r 2 )J’ 

Equation 5.4.3 says that the area is 


jLiwiHM 5 ) rMs)!)'* 1 ®' 


- / ^ 
J (0,1] x [0,2 tt] 


r 2 (l + 4 r 2 ) 2 \dudv\ ■= 27r 


“V 

square root 
of a perfect square 
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7 + r 


5.4.26 


= 3rr. A 


Jo 


5.5 Volume of Manifolds 


Everything we have done so far in this chapter works for a manifold M of 
any dimension k, embedded in any R". The A>dimensional volume of such a 
manifold is written 

/ |d*x|, 

J M 


where |d fe x| is the integrand that takes a ^-parallelogram and returns its k- 
dimensional volume. Heuristically, this integral is defined by cutting up the 
manifold into little fc-parallelograms, adding their fc-dimensional volumes and 
taking the limits of the sums as the decomposition becomes infinitely fine. 

The way to do this precisely is to parametrize M. That is, we find a subset 
U C R* and a mapping 7 : U —* M which is differentiable, one to one and onto, 
with a derivative which is also one to one. 

Then the ^-dimensional volume is defined to be 


The argument why this should 
correspond to the heuristic de- 
scription is exactly the same as the 
one in Section 5.4, and we won’t 
repeat it. 


J ^/det([D 7 (u)) T [D 7 (u)]) |d*u|. 5.5.1 

The independence of this integral from the chosen parametrization also goes 
through without any change at all. 
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The proof of Proposition 5.5.1 
is identical to the case of surfaces, 
given in Equation 5.4.16. The dif- 
ference is the width of the matrix 
[D72(v)]. In the case of surfaces, 
(D7 2 (v)| is ti x 2 ; here it. is n x k. 
(Of course the transpose is also 
different; now it is k x n.) 


Proposition 5.5.1 (Volume of manifold independent of parametriza- 
tion). IfU\ 3 U 2 are subsets of R* and 71 : U — * M, 72 : V — ► M are two 
p&rametriz&tions of M, then 

j v 'det([D 7 i(u)] T [D7i(u)]) |d*u| = v /det([D 72 (v)l T [D 7 2 (v)j) |d*v|. 


Example 5 . 5.2 (Volume of a three-dimensional manifold in M 4 ). Let 
U C R 3 be an open set, and / : U — > M be a C l function. Then the graph 
of / is a three-dimensional manifold in R 4 , and it comes with the natural 
parainetrization 



= det 


1 0 0 £>,/ 
0 1 0 D 2 f 
0 0 1 D 2 f 


1 

0 

0 


IDJ 


0 

1 

0 

D 2 f 



= det 


' 1 + (Dif? 
(Dif)(D 2 J) 
Dif)(D»f) 


(D\f)(D 2 f) DJ)(D 3 fY 
1 + (D 2 f ) 2 D 2 f)(D 3 f) 

(D 2 f)(D 3 f) 1 + (D 3 {)\ 


= 1 + (D\f ) 2 + ( D 2 f ) 2 + ( D 2 f ) 2 . 


So the three-dimensional volume of the graph of / is 

/ Vl + (DJ ) 2 + (Z> 2 /) 2 + (D 2 f) 2 \d 3 x\. 

J u 


5.5.2 


5.5.3 


5.5.4 


It is a challenge to find any function for which this can be integrated in elemen- 
tary terms. Let us try to find the area of the graph of 


/ 



1 (x 2 + y 2 + z 2 ) 


above the ball B/?(0) of radius R centered at the origin. 


5.5.5 
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Again, this is an integral which 
will stretch you ability with tech- 
niques of integration. You are 
asked in Exercise 5.5.5 to justify 
the last step of this computation. 


Using spherical coordinates, this leads to 

» 2 jt /•*■/ 2 pR 


I yj\ + x 2 + y 2 + z 2 |d 3 x| = / / / y/l+r*r 2 cos <p dr d<pd$ 

Jbo(R) 

rR 

= 4tt I y/l 4- r 2 r 2 dr 5.' 

Jo 

= tt(r( 1 + fi 2 ) 3/2 - i log(fi+ Vl + &) - ifiVl + « 2 )- A 


Example 5.5.3 (Volume of an rv-dimensionai sphere in R n+1 ). For a final 
example, iet us compute vol n S n , where S n C ® n+1 is the unit sphere. It would 
be possible to do this using some generalization of spherical coordinates, and 
you are asked to do so for the 3-sphere in Exercise 5.5.7. These computations 
become quite cumbersome, and there is an easier method. It relates vol n S n to 
the (n + l)-dimen8ional volume of an(n -I- l)-dimensional sphere, vol n +i B n+l . 

First, how might we relate the area of a disk (i.e., a two-dimensional ball, 
B 2 ) to the length of circles (i.e., one-dimensional spheres, S 1 )? We could 
fill up the disk with concentric rings, and add together their areas, each ap- 
proximately the length of the corresponding circle times some 6r represent- 
ing the spacing of the circles. The length of the circle of radius r is r x 
the length of the circle of radius 1. More generally, this approach gives 

vol n+ iB n+1 = f vol n S n (r)dr= [ r n vo\ n S n dr= — 5-~vol n (5 n ). 5.5.7 
Jo Jo n + 1 

The part of B n+1 between r and r + Ar should have volume Ar(vol n (S n (r))); 

This allows us to add one more column to Table 4.5.7: 


n 

°n - n n 1 C n -2 

Volume of ball 

Pn — CnPn- 1 

s n = (n + l)P n +l 

0 

7 r 


2 

1 

2 

2 

27T 

2 

n 

2 

7T 

47T 

3 

4 

3 

4 it 

3 

2tt 2 

4 

3rr 

ir 2 

Sir 3 


S 

2 

3 

5 

16 

15 

•vte 

col 

n 3 
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5.6 Fractals and Fractional Dimension 


/ \ 


/ 

/ 


7 A 

v~' ' 

V 


-/ vA 


..A./ 




A 7 ^ '/ 

_A / < A. / \ . 

, t < 

jA ) (.A / '* /. f 


.-A . 

^ '_.A 


cA 1 ' 



Figure 5.6.1. 

The first five steps in construct- 
ing the Koch snowfiake. Its length 
is infinite, but length is the wrong 
way to measure this fractal object. 



In 1919, Felix Hausdorff showed that dimensions are not limited to length, 
area, volume, . . . : we can also speak of fractional dimension. This discovery 
acquired much greater significance with the work of Benoit Mandelbrot showing 
that many objects in nature (the lining of the lungs, the patterns of frost on 
windows, the patterns formed by a film of gasoline on water, for example) are 
fractals, with fractional dimension. 

Example 5.6.1 (Koch snowflake). We construct the Koch snowflake curve 
K as follows. Start with a line segment, say 0 < x < 1, y = 0 in R 2 . Replace 
its middle third by the top of an equilateral triangle, as shown in Figure 5.6.1. 
This gives four segments, each one-third the length of the original segment. 
Now replace the middle third of each by the top of an equilateral triangle, and 
so on. 

What is the length of this “curve”? At resolution N = 0, we get length 1. 
At resolution N = 1, when the curve consists of four segments, we get length 
4 • 1/3. At the next resolution, the length is 16 • 1/9. As our decomposition 
becomes infinitely fine, the length becomes infinitely long! 

“Length” is the wrong word to apply to the Koch snowflake, which is neither 
a curve nor a surface. It is a fractal, with fractional dimension: the Koch 
snowflake has dimension log 4/ log 3 « 1.26. 

Let us see why this might be the case. Call A the part of the curve con- 
structed on [0, 1/3], and B the whole curve, as in Figure 5.6.2. Then B consists 
of four copies of A. (This is true at any level, but it is easiest to see at the first 
level, the top graph in Figure 5.6.1.). Therefore, in any dimension d , it should 
be true that vol <*(£) = 4vol<*(i4). 

However, if you expand A by a factor of 3, you get B. (This is true in 
the limit, after the construction has been carried out infinitely many times.) 
According to the principle that area goes as the square of the length, volume 
goes as the cube of the length, etc., we would expect dimensional volume to 
go as the dth power of the length, which leads to 

voU(B) = 3 d vold(A). 5.6.1 

If you put this equation together with vol < j(B) = 4vol < j(A), you will see that 
the only dimension in which the volume of the Koch curve can be different from 
0 or oo is the one for which 4 = 3 d , i.e., d = log4/log3. 

If we break up the Koch curve into the pieces built on the sides constructed 
at the nth level (of which there are 4 n , each of length l/3 n ), and raise their 
side-lengths to the dth power, we find 

/i\ n log 4/ log 3 

4 n l^-J — 4** e nk>g4/l0g3(logl/3) __ 4 r* e -nlog4 - ± 5 g 2 





Figure 5.6.3. 
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(In Equation 5.6.2 we use the fact that a x = e zloga .) Although the terms have 
not been defined precisely, you might expect the computation above to mean 


f |dx log4/,og3 | = 1. 

Jk 


5.6.3 


Example 5.6.2 (Sierpinski gasket). While the Koch snowflake looks like 
a thick curve, the Sierpinski gasket looks more like a thin surface. This is the 
subset of the plane obtained by taking a filled triangle of side length /, removing 
the central inscribed subtriangle, then removing the central subtriangles from 
the three triangles that are left, then removing the central subtriangles from 
the nine triangles that are left, and so on; the process is sketched in Figure 
5.6.3. We claim that this is a set of dimension log 3/log 2: at the nth stage of 
the construction, sum, over all the little pieces, the side-length to the power p: 



(If measuring length, p = 1; if measuring area, p = 2.) If the set really had a 
length, then the sum would converge when p = 1, as n — ► oo; in fact, the sum 
is infinite. If it really had an area, then the power p = 2 would lead to a finite 
limit; in fact, the sum is 0. But when p = log 3/ log 2, the sum converges to 
flog 3/ log 2 « l l 5S . This is the only dimension in which the Sierpinski gasket has 
finite, nonzero measure; in dimensions greater than log 3/ log 2, the measure is 
0, and in dimensions less than log 3/ log 2 it is infinite. A 

5.7 Exercises for Chapter Five 


The second, fourth, fifth and 
sixth steps of the Sierpinski gasket 

5.1.1 What is vol 3 of the 3-parallelogram in M 4 spanned by 


Exercises for Section 5.1: 
Parallelograms 


-1- 


-O’ 


-1- 

0 


2 

>v 3 = 

1 

1 

,v 2 = 

1 

0 

.1- 


.1. 


.2. 


5.1.2 What is the volume of a parallelepiped with three sides emanating 
from the same vertex having lengths 1,2, and 3, and with angles between them 
7r/3, 7 t/ 4, and ir/ 6? 

Hint for Exercise 5.1.3: Show 

that rank(T T T) < rankT < k. 5.1.3 Show that if Vi are linearly dependent, vol*(P(vi, . . . ,?*)) = 0. 

Exercises for Section 5.2: 5.2.1 (a) Show that the segment of diagonal € R 2 | |x| < 1 j does not 

Par ametrizat ions have one-dimensional volume 0. 
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( cost \ 

cost I does have 
sint ) 

two-dimensional volume 0, but does not have one-dimensional volume 0. 

5.2.2 Show that Proposition A19.1 is not true if X is unbounded. For in- 
stance, produce an unbounded subset of R 2 of length 0, whose projection onto 
the x-axis does not have length 0. 

5.2.3 Choose numbers 0 < r < /?, and consider the circle in the (rr, z)-plane 
of radius r centered at ( * Z_ ^ ) . Let S be the surface obtained by rotating 
this circle. 

(a) Write an equation for 5, and check that it is a smooth surface. 

(b) Write a parametrization for S, paying close attention to the sets U and 
X used. 

(c) Now parametrize the part of S where 
i) 2 > 0; 
ii) x > 0, y > 0; 

iii) z > x + y. This is much harder, and even after finding an equation 
for the curve bounding the parametrizing region, you may need a computer to 
visualize it. 

5.2.4 Let / : [a, 6] — ► R be a smooth positive function. Find a parametrization 
for the surface of equation 

Ay2 ^ 

^2 + £2 = (/(*)) # 


Exercises for Section 5.3: 
Arc Length 


5.2.5 Show that if M CK" has dimension less than k, then vol*(A/) = 0. 

*5.2.6 Consider the open subset of R constructed in Example 4.4.2: list the 
rationals between 0 and 1, say 01 , 02 , 03 , • • . and take the union 

for some integer k > 1. Show that U is a one-dimensional manifold, and that 
it cannot be parametrized according to Definition 5.2.2. 

5.3.1 (a) Let be a parametrization of a curve in polar coordinates. 

Show that the length of the piece of curve between t = a and t = b is given by 
the integral 

rb 

/ vV (<)) 2 + ( r («)) 2 ( 0'(«)) 2 

J a 


5.3.1 
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(b) What is the length of the spiral given in polar coordinates by 

($]) = ^ t )’ a>0 

between t = 0 and t = a? What is the limit of this length as or — 0? 

(c) Show that the spiral turns infinitely many times around the origin as 
t — > oo. Does the length tend to oo as t -* oo? 

5.3.2 Use Equation 5.3.1 of Exercise 5.3.1 to give the length of the curve 



between t = 1 and t = a. Is the limit of the length finite as a -* oo? 

5.3.3 For what values of a does the spiral 

between t = 1 and t - oo have finite length? 

( r{t) \ 

5.3.4 (a) Suppose now that <p(t) is a parametrization of a curve in R 3 , 

\mj 

written in spherical coordinates. Find the formula analogous to Equation 5.3.1 
of Exercise 5.3.1 for the length of the arc between t = a and t — b. 

(b) What is the length of the curve parametrized by r(t) = cos t,<p(t) — 
t,0(t) = tan t between t = 0 and t = a, where 0 < a < 7r/2? 


Exercises for Section 5.4: 
Surface Area 


5.4.1 Show that Equation 5.4.1 is another way of stating Equation 5.4.2, i.e., 
show that 


|vi x v 2 | - ^det([vi,v 2 ] T [vi,v 2 )) 


5.4.2 Verify that Equation 5.4.9 does indeed parametrize the torus obtained 
by taking the circle of radius r in the (x, z)-plane that is centered at x = R, z — 
0, and rotating it around the z-axis. 


5.4.3 
of revolution 


Compute Jc(x 2 + y 2 4- 3z 2 ) l^xl, where S is the part of the paraboloid 
ition z = x* + y 2 where z < 9. 


5.4.4 What is the surface area of the part of the paraboloid of revolution 
z = x 2 + y 2 where z < 1? 


In Exercise 5.4.5, part (b), a 
computer and appropriate soft- 
ware will help. 


5.4.5 (a) Set up an integral to compute the integral f s (x+y+z) l^xl, where 

S is the part of the graph of x 3 + y A above the unit circle. 

(b) Can you evaluate it numerically? 
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For Exercise 5.4.6, part (b) : 
the earth's diameter is 40 000 kilo- 
meters, and the earth’s axis is 
tilted by 23°. 


5.4.6 (a) l^et Si be the part of the cylinder of equation x 2 4- y 2 — 1 with 

-1 < ^ < 1, and let S 2 be the unit sphere. Show that the horizontal radial 
projection Si — ► S2 is area-preserving. 

(b) What is the area of the polar cap on earth? The tropics? 

(c) Find a formula for the area A R (r) of a disk of radius r on a sphere of 
radius R (the radius is measured on the sphere, not inside the ball). What is 
the Taylor polynomial of A(r) to degree 4? 


5.4.7 Compute the area of the graph of the function / ( y ) = §( x3 ^ 2 + 3/ 3 ^ 2 ) 
above the region 0 < x, y < 1. 


5.4.8 (a) Give a parametrization of the surface of the ellipsoid 

x 2 y 2 z 2 

- 7 T + 75- 4- -=• = 1 analogous to spherical coordinates. 
a 1 Ir cr 

(b) Set up an integral to compute the surface area of the ellipsoid. 


5.4.9 (a) Set up an integral to compute the surface area of the unit sphere. 

(b) Compute the surface area (if you know the formula, as we certainly hope 
you do, simply giving it is not good enough). 


5.4.10 Let f(x) be a positive C l function of x € [a,b]. 

(a) Find a parametrization of the surface in R 3 obtained by rotating the 
graph of / around the x-axis. 

(b) What is the area of this surface? (The answer should be in the form of 
a one-dimensional integral.) 

5.4.11 Let X C C 2 be the graph of the function w = z k , where both z — 
x + iy = re 10 and w = tx 4- iv are complex variables. 

(a) Parametrize X in terms of the polar coordinates r, 0 for z. 

(b) What is the area of the part of X where \z\ < R ? 

5.4.12 The total curvature K(S) of a surface S C R 3 is given by 

tf(S) = j |Jf(*)|M»x|. 

(a) What is the total curvature of the sphere C R 3 of radius R7 

(b) What is the total curvature of the graph of the function / 

(See Example 3.8.8.) 

(c) What is the total curvature of the part of the helicoid of equation y cos z = 
x sin z (see Example 3.8.9) with 0 < z < a? 
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*5,4.13 



be a parametrization of a parabolic cylinder. 


If T is a triangle with vertices a, b.c G M 2 , the image triangle will be by 
definition the triangle in R 3 with vertices f(a),f(b),f(c). Show that there 
exists a triangulation of the unit square in the (u, v)-plane such that the sum 
of the areas of the image triangles is arbitrarily large. 


Exercises for Section 5.5: 5.5.1 Let M\ (n,m) be the space of n x m matrices of rank 1. What is the 

Volume of Manifolds three-dimensional volume of the part of M\ (2, 2) made up of matrices A with 

\A\ < 1? 

5.5.2 A gas has density C/r, where r = y/x 2 + y 2 + z 2 . If 0 < a < 6, what 
is the mass of the gas between the concentric spheres r = a and r = 6? 

5.5.3 What is the center of gravity of a uniform wire, whose position is the 
parabola of equation y = x 2 , where 0 < x < a? 

5.5.4 Let X C C 2 be the graph of the function w — e z + e~ z , where both 
z = x + iy and w = u + iv are complex variables. What is the area of the part 
of X where -1 < x, y < 1? 


5.5.5 Justify the result in Equation 5.5.6 by computing the integral. 

5.5.6 The function cos z of the complex variable z is by definition 

e lz + e“* z 
C06Z = — - — . 

(a) If z = x + iy , write the real and imaginary parts of cos z in terms of x 
and y. 

(b) What is the area of the part of the graph of cosz where -n < x < 
7T, — 1 < y < 1? 


5.5.7 (a) Show that the mapping 

( COS V’ COS (fi cos $ 
cos ip cos (p sin 0 
cos ip sin <p 
sin ip 

parametrizes the unit sphere S 3 in R 4 when -tt/2 < <p,ip < 7r/2,0 < 9 < 2n. 
(b) Use this parametrization to compute vol 3 (S 3 ). 



5.5.8 


What is the area of the surface in C 3 parametrized by 


7W = 



zeC, \z\ < 1 ? 
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Exercises for Section 5.6: 
fractals 


Figure 5.6.1. 


5.6.1 Consider the set C obtained by by taking [0, 1], and removing first the 
open middle third (1/3, 2/3), then the open middle third of each of the segments 
left, then the open middle third of each of the segments left, etc., as shown in 
Figure 5.6.1. 

(a) Show that an alternative description of C is that it is the set of points 
that can be written in base 3 without using the digit 1. Use this to show that 
C is an uncountable set. (Hint: For instance, the number written in base 3 as 
.02220000022202002222 ... is in it.) 

(b) Show that C is a pavable set, with one-dimensional volume 0. 

(c) Show that the only dimension in which C could have volume different 
from 0 or infinity is log 2/ log 3. 


5.6.2 Now let the set C be obtained from the unit interval by omitting the 
middle l/5th, then the middle fifth of each of the remaining intervals, then the 
middle fifth of the remaining intervals, etc. 

(a) Show that an alternative description of C is that it is the set of points 
which can be written in base 5 without using the digit 2. Use this to show that 
C is an uncountable set. 

(b) Show that C is a pavable set, with one-dimensional volume 0. 

(c) What is the only dimension in which C could have volume different from 
0 or infinity? 


5.6.3 This time let the set C be obtained from the unit interval by omitting 
the middle 1/nth, then the middle 1/n of each of the remaining intervals, then 
the middle fifth of the remaining intervals, etc. 

(a) Show that C is a pavable set, with one-dimensional volume 0. 

(b) What is the only dimension in which C could have volume different from 
0 or infinity? What is this dimension when n = 2? 
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Forms and Vector Calculus 


Gradient a 1-form? How so? Hasn’t one always known the gradient 
as a vector? Yes, indeed, but only because one was not familiar with the 
more appropriate 1-form concept. — C. Misner, K. S. Thorne, J. Wheeler, 
Gravitation 


6.0 Introduction 


What really makes calculus work is the fundamental theorem of calculus: 
that differentiation, having to do with speeds, and integration, having to do 
with areas, are somehow inverse operations. 

Obviously, we will want to generalize the fundamental theorem of calculus 
to higher dimensions. Unfortunately, we cannot do so using the techniques 
of Chapter 4 and Chapter 5, where we integrated using |d”x|. The reason is 
that l^x] always returns a positive number; it does not concern itself with 
the orientation of the subset over which it is integrating, unlike the dx of one- 
dimensional calculus, which does: 


You have in fact been using 
forms without realizing it. When 
you write d(t 2 ) = 2 tdt you are 
saying something about forms, 1- 
forms to be precise. 


J f(x) dx = - jf f(x)dx. 

To get a fundamental theorem of calculus in higher dimensions, we need to 
introduce new tools. If we were willing to restrict ourselves to R 2 and M 3 we 
could use the techniques of vector calculus. We will use a different approach, 
forms , which work in any R n . Forms are integrands over oriented domains; 
they provide the theory of expressions containing dx or dxdy ... . 

Because forms work in any dimension, they are the natural way to approach 
two towering subjects that are inherently four-dimensional: electromagnetism 
and the theory of relativity. They also provide a unified treatment of differen- 
tiation and of the fundamental theorem of calculus: one operator (the exterior 
derivative) works in all dimensions, find one short, elegant statement (the gen- 
eralized Stokes’s theorem) generalizes the fundamental theorem of calculus to 
all dimensions. In contrast, vector calculus requires special formulas, operators, 
and theorems for each dimension where it works. 

On the other hand, the language of vector calculus is used in many science 
courses, particularly at the undergraduate level. So while in theory we could 
provide a unified treatment of higher dimensional calculus using onlv forms, 
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The radial vector field 
'(!)-[:]• 



The Moebius strip was discov- 
ered in 1858 by August Moebius, 
a German mathematician and as- 
tronomer. Game for a rainy day: 
Make a big Moebius strip out of 
paper. Give one young child a yel- 
low crayon, another a blue crayon, 
and start them coloring on oppo- 
site sides of the strip. 


this would probably not mesh well with other courses. If you are studying 
physics, for example, you definitely need to know vector calculus. In addition, 
the functions and vector fields of vector calculus are more intuitive than forms. 
A vector field is an object that one can picture, as in Figure 6.0.1. Coming to 
terms with forms requires more effort. We can’t draw you a picture of a form. 
A fc-form is, as we shall see, something like the determinant: it takes k vectors, 
fiddles with them until it has a square matrix, and then takes its determinant. 

We said at the beginning of this book that the object of linear algebra M is at 
least in part to extend to higher dimensions the geometric language and intu- 
ition we have concerning the plane and space, familiar to us all from everyday 
experience.” Here too we want to extend to higher dimensions the geometric 
language and intuition we have concerning the plane and space. We hope that 
translating forms into the language of vector calculus will help you do that. 

Section 6. 1 gives a brief discussion of integrands over oriented domains. In 
Section 6.2 we introduce /c- for ms: integrands that take a little piece of oriented 
domain and return a number. In Section 6.3 we define oriented parallelograms 
and show how to integrate form fields — functions that assign a form at each 
point — over parametrized domains. Section 6.4 translates the language of forms 
on R 3 into the language of vector calculus. Section 6.5 gives the definitions 
of orientation necessary to integrate form fields over oriented domains, while 
Section 6.6 discusses boundary orientation. Section 6.7 introduces the exterior 
derivative , which Section 6.8 relates to vector calculus via the grad, div, and 
curl. Sections 6.9 and 6.10 discuss the generalized Stokes’s theorem and its four 
different embodiments in the language of vector calculus. Section 6.11 addresses 
the question, important in both physics and geometry, of when a vector field is 
the gradient of a function. 

.1 Forms as Integrands over Oriented Domains. 

In Chapter 4 we studied the integrand Ic^xl, which takes a subset A cl" 
and returns its n-dimensional volume, voln A. In Chapter 5 we showed how 
to integrate the integrand l^xf (the element of arc length) over a curve, to 
determine its length, and how to integrate the integrand \d?x\ over a surface, 
to determine its area. More generally, we saw how to integrate |d*x| over a 
fc-dimensional manifold in R n , to determine its A>dimensional volume. 

Such integrands take a little piece (of curve, surface, or higher-dimensional 
manifold) and return a number. They require no mention of the orientation of 
the piece; non-orientable surfaces like the Moebius strip shown in Figure 6.1.1 
have a perfectly well-defined area, obtained by integrating l^xl over them. 

The integrands above are thus fundamentally different from the integrand 
dx of one variable calculus, which requires oriented intervals. In one variable 
calculus, the standard integrand f(x) dx takes a piece [x t , x i+ i] of the domain, 
and returns the number f(xi)(x i+1 - Xi): the area of a rectangle with height 
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While it is easy to show that 
a Moebius strip has only one side 
and therefore can’t be oriented, it 
is less easy to define orientation 
with precision. How might a flat 
worm living inside a Moebius strip 
be able to tell that it was in a 
surface with only one side? We 
will discuss orientation further in 
Section 6.5. 

Analogs to the Moebius strip 
exist in all higher dimensions, but 
all curves are orientable. 


f(xi) and width x i+i -x*. Note that dx returns x*+i -x„ not |x* + i -Xj|; that 
is why 

J f(x)dx = -J^ f(x)dx. 6.1.1 

In order to generalize the fundamental theorem of calculus to higher dimen- 
sions, we need integrands over oriented objects. Forms are such integrands. 

Example 6.1.1 (Flux form of a vector field: $?). Suppose we are given a 
vector field F on some open subset U of R 3 . It may help to imagine this vector 
field as the velocity vector field of some fluid with a steady flow (not changing 
with time). Then the integrand associates to a little piece of surface the 
flux of F through that piece; if you imagine the vector field as the flow of a 
fluid, then associates to a little piece of surface the amount of fluid that 
flows through it in unit time. 

But there’s a catch: to define the flux of a vector field through a surface, you 
must orient the surface, for instance by coloring the sides yellow and blue, and 
counting how much flows from the blue side to the yellow side (counting the 
flow negative if the fluid flows in the opposite direction). It obviously does not 
make sense to calculate the flow of a vector field through a Moebius strip. A 


6.2 Forms on R n 


The important difference be- 
tween determinants and fc-forms is 
that a fc-form on R n is a function 
of k vectors, while the determinant 
on R n is a function of n vectors; 
determinants are only defined for 
square matrices. 

The words antisymmetric and 
alternating are synonymous. 

Antisymmetry 
If you exchange any two of the 
arguments of <p, you change the 
sign of ip: 


You should think of this section as a continuation of Section 4.8. There we 
saw that there is a unique antisymmetric and multilinear function of n vectors 
in R n that gives 1 if evaluated on the standard basis vectors: the determinant. 
Because of the connection between the determinants and volumes described in 
Section 4.9, the determinant is fundamental to multiple integrals, as we saw in 
Section 4.10. 

Here we will study the multilinear antisymmetric functions of k vectors in 
R n , where k > 0 may be any integer, though we will soon see that the only 
interesting case is when k < n. Again there is a close relation to volumes, and 
in fact these objects, called forms, are the right integrands for integrating over 
oriented domains. 


^(vi,...,v*,..., ?>,..., v*) 

= - </?(vi,...,v>,...,Vi,...,v*). 

Multilinearity 
If <p is a fc-form and v, = au + 
6w, then 

<p(vi , . . . , (au + 6w),. . . = 

av?(vi,. . . ,Vi_t,u,Vi+i,. . .v fc )+ 
Mv i,. • . ,v,_i, v*). 


Definition 6.2.1 (fc-form on R n ). A /e-form on R n is a function <p that 
takes k vectors in R n and returns a number, such that i, . . . , ?*) is mul- 

tilinear and antisymmetric. 

That is, a A: -form <p is linear with respect to each of its arguments, and 
changes sign if two of the arguments are exchanged. 

It is rather hard to imagine forms, so we start with an example, which will 
turn out to be the fundamental example. 
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The entire expression 
dx M A • ■ ■ A dxi k 

is just the name for this fc-form; 
for now think of it as a single 
item without worrying about the 
component parts. The reason for 
the wedge A will be explained at 
the end of this section, where we 
discuss the wedge product,, we will 
see that the use of A in the wedge 
product is consistent with its use 
here. In Section 6.8 we will see 
that the use of d in our notation 
here is consistent with its use to 
denote the exterior derivative. 

Note (Equation 6.2.4) that to 
give an example of a 3-form we had 
to add a third vector. You cannot 
evaluate a 3-form on two vectors 
(or on four); a A:- form is a func- 
tion of k vectors. If you have more 
than k vectors, or fewer, then you 
will end up with a matrix that is 
not square, which will not have a 
determinant. But you can evalu- 
ate a 2-form on two vectors in IR 4 , 
as we did above, or in IR 16 . This is 
not the case for the determinant, 
which is a function of n vectors in 
IR n . 

In the top line of Equation 6.2.5 
we could write 

l^xKV) = yj det (v T v), 

in keeping with the other formu- 
las, since det of a number equals 
that number. 


Example 6.2.2 (fc-form). Let i u ... ,t* be any A: integers between 1 and n. 
Then dx il A • • • A dx ik is that function of k vectors vi, . . . , v* in l n that puts 
these vectors side by side, making the n x k matrix 


vi.i 


V\ ,k 


6.2.1 


L v n,l v n,fcJ 

and selects first the iith row, then the i 2 row, etc, and finally the i k t h row, 
making the square k x k matrix 

v»,,i ••• 


6.2.2 


and finally takes its determinant. For instance, 

j r n r 3i 

dx i A dx 2 

2-form 


2 

-1 

L 1J 


-2 

1 

2 J 


= det 


1 3 

2 -2 


= - 8 . 


6.2.3 


1st and 2nd rows 
of original matrix 


dx i A dx 2 A dx 4 


3-form 


1 

2 

-1 

1 


r 3i 
-2 
1 

L 2 J 


ro 

1 

2 

1 J 


= det 


1 3 0 

2 -2 1 

1 2 1 


= -7 A 6.2.4 


Remark. The integrand |d*x| of Chapter 5 also takes k vectors in IR n and 
gives a number: 

l^xKv) = |v| = Vv T v, 


|d 2 x|(vi,v 2 ) = ^det^[v!, v 2 ] t [vi, v 2 ]^ 

|d*x|(vi,...,v*) = yj det ([vi , v*]| T [? 1 , ... , v*]) . 
Unlike forms, these are not multilinear and not antisymmetric. A 

Geometric meaning of A>forms 

Evaluating the 2-form dx\ A dx 2 on the vectors a and b, we have: 


6.2.5 


dx 1 A dx 2 


ai 

o 2 

«3 


61 

62 

63 


= det [ a * b ' 
[a 2 62 . 


= a 162 — 0261 , 


6.2.6 
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which can be understood geometrically. If we project a and b onto the (xj, X2)- 
plane, we get the vectors 


fli 

0>2 


and 


bi 

62 


6.2.7 


Rather than imagining project- 
ing a and b onto the plane to get 
the vectors of Equation 6.2.7, we 
could imagine projecting the par- 
allelogram spanned by a and b 
onto the plane to get the parallel- 
ogram spanned by the vectors of 
Equation 6.2.7. 


the determinant in Equation 6.2.6 gives the signed area of the parallelogram 
that they span, as described in Proposition 1.4.14. 

Thus dx 1 A dx 2 deserves to be called the (xj,X2) component of signed 
area. Similarly , di 2 A <2x3 and dx 1 Adx 3 deserve to be called the (x2,X3) 
and the (x^xa) components of signed area. 

We can now interpret Equations 6.2.3 and 6.2.4 geometrically. The 2-form 
form dxi A dx 2 tells us that the (xi,X2) component of signed area of the par- 
allelogram spanned by the two vectors in Equation 6.2.3 is —8. The 3-form 
dx\ A dx 2 A dx 4 tells us that the (dx \ , dx 2, dx 4 ) component of signed volume of 
the parallelepiped spanned by the three vectors in Equation 6.2.4 is —7. 

Similarly, the 1-form dx gives the x component of signed length of a vector, 
while dy gives its y component: 


dx 


2 

-3 

1 


= det 2 = 2 and dy 



= det (—3) = 3. 


More generally (and an advantage of fc-forms is that they generalize so easily 
to higher dimensions), we see that 



6.2.8 


is the ith component of the signed length of , and that dxi x A • A dii k , eval- 
uated on (vj , . . . v*), gives the (x*, , . . . x* fc ) component of signed k-dimensional 
volume of the ^-parallelogram spanned by ?i, ... v*. 


Elementary forms 

There is a great deal of redundancy in the expressions dx ix , A • • A dii k . Consider 
for instance dx \ A dx 3 A dx 1 . This 3-form takes three vectors in K n , stacks them 
side by side to make an n x 3 matrix, selects the first row, then the third, then 
the first again, to make a 3 x 3 and takes its determinant. So far, so good; but 
observe that the determinant in question is always 0, independent of what the 
vectors were; we have taken the determinant of a 3 x 3 matrix for which the 
third row is the same as the first; such a determinant is always 0. (Do you see 
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why? 1 ) So 


dx\ A dx 3 A dx i = 0; 


6.2.9 


Recall (Definition 4.8.16) that 
the signature of a permutation 
<r, denoted sgn(cr), is sgn(cr) = 
det M„, where M a is the permuta- 
tion matrix of a. Theorem 4.8.18 
gives a formula for the determi- 
nant using the signature. 


it takes three vectors and returns 0. 

But that is of course not the only way to write the form that takes three 
vectors and returns the number 0; both dx\ A dx\ A dx 3 and dx 2 A dx 3 A dx$ 
do so as well, and there are others. More generally, if any two of the indices 
ix,. . . ,ik are equal, we have dxi x A • • • A dxi k = 0: the Ar-form dXi x A • • • A dx XkX 
where two indices are equal, is the &-form which takes k vectors and returns 0. 

Next, consider dx 1 A dx 3 and dx 3 A dx\. Evaluated on 



'ai ‘ 


■&r 

a = 

-Un- 

and b = 

id 


we find 


—9 

dx 1 A dx3(a, b) = det 
dx 3 A <i.Ti(a, b) = det 


a 1 b\ 
03 &3 

03 &3 

ai bi 


~ a 1^3 “ 03^1 
= 0361 — a \ b ^. 


6.2.10 


Clearly dxi A dx 3 = -dx3 A dxi ; these two 2-forms, evaluated on the same two 
vectors, always return opposite numbers. 

More generally, if the integers ti, . . . , z* and ji , . . . , j* are the same integers, 
just taken in a different order, so that j 1 = z a (i), j2 = *<7(2)1 • • • 1 3k = M*) f° r 
some permutation cr of then 


dxj x A • • • A dxj k = sgn^Jrfarij A • • • A dx ijt . 6.2.11 

Indeed, A • • • A dij k computes the determinant of the same matrix as 
dx{ x A • • • A dx* k , only with the rows permuted by cr. For instance, dx 1 A dx 2 = 
— dx 2 A dx 1, and 


dxi A dx 2 A dx 3 = dx 2 A dr 3 A dxi = dxz A dxi A dx 2 . 6.2.12 

To eliminate this redundancy, we make the following definition: an elemen- 
tary k-form is of the form 


dx ix A • • • A dxi k with 1 < *1 < z 2 • • • < z* < n; 6.2.13 

putting the indices in increasing order selects one particular permutation for 
any set of distinct integers j \ y . . . ,jk- 


l The determinant of a square matrix containing two identical columns is always 
0, since exchanging them reverses the sign of the determinant, while keeping it the 
same. Since (Theorem 4.8.10), det A = det A J , the determinant of a matrix is also 0 
if two of its rows are identical. 



6.2 Forms on R n 505 


Definition 6.2.3 (Elementary A>forms on R n ). A elementary fc-form on 
R n is an expression of the form 

dx ix A • • * A dx tk , 6.2.14 

where 1 < i\ < ••• < i* < n (and 0 < k < n). Evaluated on the vectors 
vi , . . . , Vfc, it gives the determinant of the k x k matrix obtained by selecting 
the ii, . . . , ik rows of the matrix whose columns are the vectors tfi, . . . , v*. 
The only elementary 0-form is the form, denoted 1, which evaluated on zero 
vectors returns 1. 


That there ewe no elementary 
fc-forms when k > n is an example 
of the “pigeon hole” principle: if 
you have more than n pigeons in 
n holes one hole must contain at 
least two pigeons. Here we would 
need to select more than n distinct 
integers between 1 and n. 


Note that there are no elementary k forms on R n when k > n; indeed, there 
are no nonzero forms at all when k > n : there is no function </? that takes k > n 
vectors in R" and returns a number, such that <^(vi, . . . , v*) is multilinear and 
antisymmetric. If Vi, . . . , v* are vectors in R n and k > n, then the vectors are 
not linearly independent, and at least one of them is a linear combination of 
the others, say 

k - 1 

V* = £ 0 ,*. 6.2.15 

i=i 

Then if y? is a A:-form on R n , evaluation on the vectors Vi, . . . , v* gives 


k - 1 

¥>(vi , . . . , v fc ) = y>(vj , . . . , Yl Vi) 

t=l 


Ac— 1 

= ^OiV?(vi,...,Vi,...,Vi). 
1=1 


6.2.16 


Each term in this last sum will compute the determinant of a matrix, two 
columns of which coincide, and will give 0. 

In terms of the geometric description, this should come as no surprise: you 
would expect any kind of three-dimensional volume in R 2 to be zero, and more 
generally any ^-dimensional volume in R n to be 0 when k > n. 

What elementary k - forms exist on R 4 ? 2 


2 On R 4 there exist 

( 1 ) one elementary 0 -form, the number 1 . 

( 2 ) four elementary 1 -forms: dx\,dx 2 ,dxz and dx 4 . 

(3) six elementary 2-forms: dx\ A dx 2y dxi A dx 3 ,dxi A dx 4 , dx 2 A dx 3 ,dx 2 A dx 4t 
and dx 3 A dx*. 

(4) four elementary 3-forms: dx\ A dx 2 A dx 3 , dx\ A dx 2 A dx 4 , dx\ A dx 3 A dx 4 , 

dx 2 A dx 3 A dx 4 . 

(5) one elementary 4-form: dx\ A dx 2 A dx 3 A dx 4 . 
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All forms are linear combinations of elementary forms 


The Greek letter xp is pro- 
nounced “psi.” 

Adding k-forms 

3dx Ady + 2dx Ady — 5dxAdy. 

Multiplying k-forms by scalars 

5 ( dx A dy 4- 2 dx A dz) 

— 5 dx A dy + 10 dx A dz. 


We said above that dx ix A • A dx Xk is the fundamental example of a A:- form. 

Now we will justify this statement, by showing that any k-form is a linear 

combination of elementary k-forms. 

The following definitions say that speaking of such linear combinations makes 

sense: we can add k-forms and multiply them by scalars in the obvious way. 

Definition 6.2.4 (Addition of k-forms). Let p and xp be two informs. 

Then 

<P&u ‘-">*k) + ♦ • • >?k) * (<fi + ip)(* i, . . • , tf/t). 

Definition 6.2.5 (Multiplication of k-forms by scalars). If p is a 

k-form and a is a scalar, then 

, v fc ) = o(<p(^i, . . . , v fc )). 

Using these definitions of addition and multiplication by scalars, the space of 
k-forms in lR n is a vector space. We will now show that the elementary k-forms 
form a basis of this space. 

Definition 6.2.6 (A k (W 1 )). The space of k-forms in IR” is denoted A fc (R n ). 

Theorem 6.2.7. The elementary k-forms form a basis for A*(IR n ). 

In other words, every multilinear and antisymmetric function p of k vectors 
in R n can be uniquely written 

*>= a ii...i k d&ii A • • • A dx ik> 6.2.17 

and in fact the coefficients are given by 

<p (e»j , . . . , e tfc ). 6.2.18 

Proof. Most of the work is already done, in the proof of Theorem 4.8.4, showing 
that the determinant exists and is uniquely characterized by its properties of 
multilinearity, antisymmetry, and normalization. (In fact, Theorem 6.2.7 is 
Theorem 4.8.4 when k = n.) We will illustrate it for the particular case of 
2-forms on ffi 3 ; this contains the idea of the proof while avoiding hopelessly 
complicated notation. Let <p be such a 2-fonn. Then, using multilinearity, 
we get the following computation. Forget about the coefficients, and notice 
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that this equation says that p is completely determined by what it does to the 
standard baxis vectors : 



= ^( tqei + V2J2 + 1*363 . uqei 4 - w 2 e 2 4 - 11-303 ) 


w 


= ^(t'lei, t/qei 4- w 2 e 2 + 1^363) + p(v 2 e 2 . w\e\ + w 2 e 2 4- u’rje.j) 

w w 

+ ^(^3631 uqei 4 - w 2 e 2 4 - ^363) 0 . 2. 19 

' ' 

w 


= <^(viei,i£;iei) 4 - <p(v\e u w 2 e 2 ) + , w 3 e 3 ) 4 - . . . 


When reading Theorem 6.2.8, 
reinember that 0! = I. 


Note that 

(;)■(.-.)• 

since 

_ n - 
(n — k)\(n ~(n — A;)^! 

n! 

~ k\(n - k)\ 


= (viw 2 -v 2 wi)p(^i,e 2 ) + (viw 3 -v 3 wi)p(ei,e- 3 ) 4- (v 2 w :i - v±w>)p{e 2 , e 3 ). 

An analogous but messier computation will show the same for any k and m 
(p is determined by its values on sequences e,:, ,e ik , with ascending indices. 
(The coefficients will be complicated expressions that give determinants, as in 
the case above, but you don't need to know that.) So any fc-forin that gives 
the same result when evaluated on every sequence e,,, . . . ,e.j k with ascending 
indices coincides with <p. Thus it is enough to check that 

Y A • • • A </x,„ (e >t g Jk ) = aj,. 6.2.20 

1 <M < --<if;<n 

This is fairly easy to see. If ^ j\ , jk , then there is at least one 

i that does not appear among the jrs, so the corresponding efcr,-, acting on the 
matrix ej, , . . . , ej k , selects a row of zeroes. Thus 


In particular, for a given n, 
there are equal numbers of elemen- 
tary A;- forms and elementary (n - 
Ar)-forms: .4 fc (IR n ) and .4 n - fc (IR n ) 
have the same dimension. Thus, in 
R 3 , there is one elementary 0-form 
and one elementary 3-form, so the 
spaces A°{R 3 ) and A 3 (M 3 ) can be 
identified with numbers. There 
are three elementary 1-forms and 
three elementary 2-forms, so the 
spaces A'(R 3 ) and A 2 (*Ri 3 ) can be 
identified with vectors. 


dXi x ....,** (©ji >•••« ©jk ) 6.2.21 

is the determinant of a matrix with a row of zeroes, so it vanishes. But 

(®?i > • • • » e >fc ) = 6.2.22 

since it is the determinant of the identity matrix. □ 

Theorem 6.2.8 (Dimension of A fc (M n )). The space A fc (R n ) has dimen- 
sion equal to the binomial coefficient 

( n\ n\ 

{k) = W^W: 6 - 2 - 23 


Proof. Tliis is just a matter of counting the elements of the basis: i.o.. t he 
number of elementary A;-forms on IR”. Not for nothing is the binomial coefficient 
called “n choose A:”. □ 
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Example 6.2.9 (Dimension of A k (l& 3 )). The dimension of 4°(R 3 ) and of 
A 3 (U 3 ) is 1, and the dimension of ^(M 3 ) and of j 4 2 (R 3 ) is 3, because on K 3 

we have 



3! 

0!(3)! 


3! 

2 !( 1 )! 


= 1 elementary 0-form; 
— 3 elementary 2- forms; 



3! 


1!(2)! 

3! 


3!(0)! 


= 3 elementary 1-forms; 


= 1 elementary 3- form. 


Forms on vector spaces 


Remember that although an el- 
ement of an (abstract) vector 
space is called a vector, it need not 
be a column vector. But we will 
concentrate on subspaces E C lR n , 
whose elements are column vec- 
tors. 


So far we have been studying k forms on IR n . When defining orientation, we 
will make vital use of k - forms on a subspace E C R n . It is no harder to write 
the definition when E is an abstract vector space. 

Definition 6.2.10 (The space A k (E )). Let E be a vector space. Then 
A k (E) is the set of functions that take k vectors m E and return a number, 
and which are multilinear and anti-symmetric. 


Some texts use the notation 
Q k (E) rather than A k (E); yet oth- 
ers use A k (E 0 ). 

In Definition 6.2.10 we do not 
require that E be a subset of R n : 
a vector in E need not be some- 
thing we can write as a column 
vector. But until we assign E a 
basis we don’t know how to write 
down such a A:- form. 

Just as when E = R n , the 
space A k (E) is a vector space us- 
ing the obvious addition and mul- 
tiplication by scalars. 


The main result we will need is the following: 

Proposition 6.2.11 (Dimension of A k (E)). If E has dimension m, then 
A k (E) has dimension ( • 


Proof. We already know the result when E = K m , and we will use a basis 
to translate from the concrete world of to the abstract world of E. Let 
bj,..., b m be a basis of E. 

Then the transformation ${b} : E given by 


ai 


La, 


m J 


aibj + 


+ a m b 




6.2.24 


is an invertible linear transformation, which performs the translation “concrete 
— * abstract.” (See Definition 2.6.12.) We will use the inverse dictionary 
We claim that the forms 1 < *i, <•••<**< m, defined by 

^*i ,•••,** (Xii • • • i Y.k) = ^*i A • • • A dxi k • • • i *)) 6.2.25 

form a basis of A k (E ). There is not much to prove: all the properties follow 
immediately from the corresponding properties m R m . One needs to check that 
the v?»i i k are multilinear and antisymmetric, that they are linearly indepen- 

dent, and that they span A k (E). 
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Let us see for instance that the ,* are linearly independent. Suppose 

that 

£ <*., ..vi. u =«• 6.2.26 

l<ti,<-<» fc <m 

Then applied to the particular vectors 

hj = ^(b)(ej ; ) 6.2.27 

we will still get 0. But 


Checking the other properties 
is similar, and left as Exercise 
6.2.1. 

The last equality of Equation 
6.2.28 is Equation 6.2.20. 


^ ^ iie Pi\ ik (^{b} ( e ji )> • • • * ^(b) )) 

1 < * 1 .< •••<**:<»»« 

= ^ ^ Uj, , — i k dx H A * • A d,Xi k , . . . , Gj k ) 

0*:<m 

= a jl - -jk- = 

So all the coefficients are 0, and the forms arc linearly independent. □ 
The case of greatest interest to us is the case when m = k\ 


Corollary 6.2.12 follows imme- 
diately from Proposition 6.2.11. 

Regarding Corollary 6.2.12, 
there is one “trivial case” when the 
argument above doesn't quite ap- 
ply: when E = {0}. But it is easy 
to see the result directly: an ele- 
ment of A"({0}) is simply a real 
number: it takes zero vectors and 
returns a number. So T°({0}) is 
not just one-dimensional; it is in 
fact 


The wedge product is a messy 
thing: a complicated summation, 
over various shuffles of vectors, of 
the product of two A:- for ms. each 
given the sign — or - according to 
the rule for permutations. 

Figure 6.2.1 explains the use of 
the word "shuffle” to describe the 
a over which we are summing. 


Corollary 6.2.12. If E is a k-dimensionai vector space, then A k (E) is a 
vector space of dimension 1. 


The wedge product 

We have used the wedge A to write down forms; now we will see what it means: 
it denotes the wedge product, also known as the exterior product. 

Definition 6.2.13 (Wedge product). Let <p be a fc-form and C be a 
A-form, both on lK rt . Then their wedge product <p A iff is a {k + /)-form 
that acts on k T l vectors. It is defined by the following sum, where the 
summation is over all permutations o of the numbers 1, 2, 3, . . . , k + / such 
that cr( 1) < cr(2) < • • • < a(k) and a (k + 1) < ■ • • < a(k + /): 

wedge product evaluated 
on fc-M vectors 
/ A s 

<^At/?(v 1 ,V 2 ,...,Vfc-M) 

= Sgn(cr)<p (v <7(1) , . . . , v^fcj) ^ (v^fc+j), , , . , v ff(k+/) , . 

shuffles y 

<r€Perm(lt,t) fc vectors l vectors 


We start on the left with a (k + l )- form evaluated on k + l vectors. On the 
right we have a somewhat complicated expression involving a A-form ^ acting 



510 Chapter 6. Forms and Vector Calculus 



Figure 6.2.1. 


Take a pack of k + 1 cards, cut 
it to produce subpacks of k cards 
and l cards, and shuffle them. 
The permutation you obtain is one 
where the order of the cards in the 
subpacks remains unchanged. 


More simply, we note that the 
first permutation involves an even 
number (0) of exchanges, or trans- 
positions, so the signature is posi- 
tive, while the second involves an 
odd number (1), so the signature 
is negative. 


The wedge product of a 0-form 
a with a A>form ip is a A>form, 
a A ip = aip. In this case, the 
wedge product coincides with mul- 
tiplication by numbers. 


on k vectors, and a l- form ip acting on l vectors. To understand the right-hand 
side, first consider all possible permutations of the k + l vectors Vi , v 2 , . . . , Vk+i > 
dividing each permutation with a bar line | so that there are k vectors to the 
left and l vectors to the right, since y? acts on k vectors and ip acts on l vectors. 
(For example, if k = 2 and l = 1, one permutation would be written viv 2 |v 3 , 
another would be written V 2 V 3 IV 1 , and a third V 3 V 2 IV 1 .) 

Next, chose only those permutations where the indices for the fc-form (to the 
left of the dividing bar) and the indices for the l- form (to the right of the bar) 
are each, separately and independently, in ascending order, as illustrated by 
Figure 6.2.1. (For k = 2 and l = 1, the only allowable choice is viv 2 |v 3 .) We 
assign each chosen permutation its sign, according to the rule given in Definition 
4.8.16, and finally, take the sum. 


Example 6*2.14 (The wedge product of two 1-forms). If and ip are 
both 1 -forms, we have two permutations, vi|v 2 and V 2 |vi» both allowable under 
our “ascending order” rule. The sign for the first is positive, since 



— ¥ 

Vl‘ 

V 2 


v 2 _ 


gives the permutation matrix 


1 0 
0 1 


with determinant + 1 . The sign for the second is negative, since 


5 


Vl 

— ¥ 

V 2 

[v 2 J 


Lvij 


gives the permutation matrix 


0 1 
1 0 ’ 


with determinant - 1 . So in this case the equation of Definition 6.2.13 becomes 
(y? A ip)(y 1 , v 2 ) = (p (vi ) ip (v 2 ) - (f(v 2 ) ip(y 1 ). 6.2.29 

We see that the 2-form dx\ A dx 2 


dx 1 A dx 2 (a, b) = det 


ai 61 
02 b 2 


= 01&2 — a 2 6i, 


6.2.30 


is indeed equal to the wedge product of the 1 -forms dx 1 and dx 2 , which, eval- 
uated on the same two vectors, gives 


dx 1 A dx 2 (a, b) = dx\(S)dx 2 (b) - dxi(b)dx 2 (a) = 0162 — 02^1 ■ 6.2.31 

So our use of the wedge in naming the elementary forms is coherent with its 
use to denote this special kind of multiplication. 


Example 6.2.15 (The wedge product of a 2-form and a 1-form). If </? 
is a 2 -form and ip is a 1 -form, then we have the six permutations 

V 1 V 2 IV 3 , V1?3|V2, V2V3|?1, v 3 vi|v 2 , V 2 Vi|v 3 > and V 3 V 2 IVJ. 6.2.32 

The first three are in ascending order, so we have three permutations to sum, 

+(viv 2 |v 3 ), -(viv 3 |v 2 ), +(v 2 V 3 |vi), 6.2.33 


6.2 Forms on 3T 511 


The wedge product y? A ip sat- 
isfies a) and b) of Definition 6.2.1 
for a form (multilinearity and an- 
tisymmetry). Multilinearity is not 
hard to see; antisymmetry is hard- 
er (as was the proof of antisymme- 
try for the determinant). 


You are asked to prove Proposi- 
tion 6.2.16 in Exercise 6.2.4. Part 
(2) is quite a bit harder than the 
other two. Exercise 6.2.5 asks you 
to verify that Example 6.2.14 does 
not commute, and that Example 
6.2.15 does. 

Part (2) justifies the omission 
of parentheses in the k - form 

dxi x A dxi 2 A • • ■ A dxi k ; 

all the ways of putting parentheses 
in the expression give the same 
result . 


giving the wedge product 

pAip(\\, V2, V3) = <P (vi, V 2 ) ip(vz)~<p (vi, v 3 ) ^(v2)+V?(v-2, v ;{ ) t’>(vi). (i. 2.34 

Again, let’s compare this result with what we get using Definition 6.2.3. 
setting <p = dx\ Adx 2 and ip = dx 3 ; to avoid double indices we will rename the 
vectors vi , V 2 , V 3 , calling them u, v, and w. Using Definition 6.2.3 we get 


«i n u>i 
U 2 t’2 w 2 
u 3 v 3 w 3 


(dx 1 A dx 2 ) A dx^(u, v, w) = det 

<p 

= U\V2W 3 - U\V 3 W 2 - U2V\XV 3 + U 2 V 3 W 1 + U 3 V\ l (>2 ~ U 3 V 2«'|- 


6.2.35 


If instead we use Equation 6.2.34 for the wedge product, we get 
(dxiAdx 2 ) A dx 3 (u,v,w) — (dx 1 A dx 2 )(u, v) dx 3 (w) 

- (dx 1 A dx 2 )(u> w) dx 3 (v) 

-I- (dx 1 A dx 2 ) (v, w) dx 3 (u) 


= det 


U\ Vi 
U 2 V2 


w 3 — det 


U\ W\ 
U2 W2 


v 3 + det 


t'l W] 
V2 W 2 


Ms 


= U1 1/2103 — UiV 3 W2 ~ U2V\W 3 + U2V 3 W\ + U 3 V\W2 ~ tt3 02101. 


6.2.36 

A 


Properties of the wedge product 

The wedge product behaves much like ordinary multiplication, except that one 
needs to be careful about the sign, because of skew commutativity: 

Proposition 6.2.16 (Properties of the wedge product). The wedge, 
product has the following properties: 

(1) distributivity: <p A (ip\ + 1 P 2 ) = <p A ip\ + ip A ip 2 . 6.2.37 

(2) associativity: (<pi A <p 2 ) A (p 3 = <p\ A (p 2 A <p 3 ). 6.2.38 

(3) skew commutativity: If p is a k-form and ip is an l- form, then 

<p A ip = (— l) kl ip A p. 6.2.30 

Note that in Equation 6.2.39 the p and ii> change positions. For example, if 
p = dx 1 A dx 2 and ip — dx 3 , skew commutativity says that 

(dx 1 A dx 2 ) A dx 3 = (-l) 2 dx 3 A (dx\ A dx 2 )> i e., 



Ui 

Vl 

Wi' 


u 3 

V3 

w 3 

det 

u 2 

V 2 

w 2 

= det 

U 1 

Vi 

W\ 


m u 3 

03 

W 3 _ 


112 

V2 

W 2m 



512 Chapter 6. Forms and Vector Calculus 


which you can confirm either by observing that the two matrices differ by two 
exchanges of rows (changing the sign twice) or by carrying out the computation. 


6.3 Integrating Form Fields over Parametrized 
Domains 


When k = 2, a direct basis 
vj,V 2 is one where v 2 lies coun- 
terclockwise from Vi , if ei and e 2 
are drawn in the standard way. In 
M 3 , a basis Vi , v 2 , v 3 is direct if it 
is right-handed, again if ei , g 2 and 
e 3 are drawn in the standard right- 
handed way. (The right-hand rule 
is described in Section 1.4.) 


In Px(vi, . . . , V*)> the little o 
is there to remind you that this 
is an oriented parallelogram: an 
oriented subset of R". 


The objective of this chapter is to define integration and differentiation over 
oriented domains. We now make our first stab at defining integration of forms; 
we will translate these results into the language of vector calculus in Section 
6.4 and will return to orientation and integration of form fields in Section 6.5. 

We say that k linearly independent vectors Vj , . . . , v* in R* form a direct 
basts of R* if det[vi,... , v*] > 0, otherwise an indirect basis. Of course, this 
depends on the order in which the vectors Vi , . . . , v* are taken. We want to 
think of things like the fc-parallelogram P x (vi, . . . , v k ) in R* (which is simply 
a subset of R*) plus the information that the spanning vectors form a direct or 
an indirect basis. 

The situation when there are k vectors in R n and k ^ n is a little different. 
Consider a parallelogram in R^ spanned by two vectors, for instance 



r 


r* ^ 

1 

Vi = 

i 

-i 

and v 2 = 

-1 

1 


This parallelogram has two orientations, but neither is more “direct” than the 
other. Below we define orientation for such objects. 

An oriented k- parallelogram in R n , denoted db/£(v, , . . . , v*), is a fc-parallel- 
ogram as defined in Definition 5.1.1, except that this time all the symbols 
written are part of the data: the anchor point, the vectors v*, and the sign. As 
usual, the sign is usually omitted when it is positive. 

Definition 6.3.1 (Oriented ^-parallelogram). An oriented fc-parallelo- 
gram • • • , is a parallelogram in which the sign and the order 

of the vectors are part of the data. The oriented ^parallelograms 

&nd — . . . , i^) 

have opposite orientations, as do two oriented Ar-parallelograms 
PJ(vi, . . . , y k ) if two of the vectors are exchanged. 


Two oriented ^-parallelograms are opposite if the data for the two is the 
same, except that either (1) the sign is changed, or (2) two of the vectors are 
exc anged (or, more generally, there is an odd number of transpositions of 
vectors). They are equal if the data is the same except that (1) the order of 
the vectors differs by an even number of transpositions, or (2) the order differs 
by an odd number of transpositions, and the sign is changed. For example, 
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Another way of saying this is 
that if a permutation a is applied 
to the vectors, the parallelogram is 
multiplied by the signature of a: 

^x(v* n) v*(k)) 

= sgn(<r)Px(vi.. . . . v*)- 


P x °(vi,v 2 ) 

and 

P x "(v 2 .vi) 

are opposite; 

P x °(v 1 .V 2 ) 

and 

-P x °(v 2 .v ,) 

are equal; 

P x °(vi.v 2 ,v 3 ) 

and 

F x °(v 2 .Vi,v 3 ) 

are opposite: 

/?(Vl,V2 } V3) 

and 


are opposite; 

^(V,,V2,V 3 ) 

and 

^ x (V2.V 3 ,Vl) 

are equal. 


Are Px(vi,V3,V2),P^(v3 5 v 1 ,v 2 ),-Px(v2,Vi,V 3) equal or opposite? 3 


Form fields 


The word “field” means data 
that varies from point to point. 
The number a form field gives de- 
pends on the point at which it 
is evaluated. A A.- form field is 
also called a “differential form.” 
We find “differential” a mystify- 
ing word; it is almost, impossible to 
make sense of the word “differen- 
tial'' as it is used in first year cal- 
culus. We know a professor who 
claims that he has been teaching 
“differentials ' for 20 years and still 
doesn’t know what they are. 


Most often, rather than integrate a fc-form, we will integrate a k-form field. A 
A:-form field <p on an open subset U of R” assigns a /r-form p(x) to every point 
x in U. While the number returned by a A'-form depends only on k vectors, 
the number returned by a A>form field depends also on the point at which is 
evaluated: a fc-form is a function of k vectors, but a fc-form field is a function 
of an oriented A’-parallelogram P£(v\, v*), which is anchored at x. 

Definition 6.3.2 (A> form field). A fc-form field on an open subset U C R n 
is a function that takes k vectors Vi, . . . , v* anchored at a point x 6 !R n , and 
which returns a number. It is multilinear and antisymmetric as a function 
of the v’s. 

We already know how to write fc-form fields: it is any expression of the form 
p = 5Z a o ** ( x ) A ■ • ' A > 6.3.2 

1 <i i <- -<ih<n 


where the are real-valued functions of x € U. 


Example 6.3.3 (A 2-form field on H 3 ). The form field cos(xs) dx A dy is a 
2-form field on E 3 . Below it is evaluated twice, each time on the same vectors, 
but at different points: 


/ \ 


cos (xz) dx A dy 

7,x ( 

0 

> 

2 

2 

) 

= (cos(l ■ 7r)) det. 

1 2 

0 2 


U 

1 


3 

m m 

/ , 
/ 






cos(a: 2 ) dx A dy 

po I 

"l" 

0 

1 

'2' 

2 

) 

II 

TT 

o 

Cfi 

»— * 

to 

CL 

<t> 

r— 

1 2 

0 2 


Ui)' 

1 


3 

/ , 
/ 

1 



3 Px(v 3, V], v 2 ) = -Px(v 2 > v 1 ,v 3 ). Both are opposite to P£(\ i.v.j, v*). 



A singularity is a point where a 
subset fails to meet the criteria for 
a smooth manifold; for a curve, for 
example, it could be a point where 
the curve intersects itself. But 
a singularity can be much worse 
than that; for example, the curve 
could go through itself infinitely 
many times at such a point. 


The fc-parallelogram in Equa- 
tion 6.3.3 is oriented: it comes 
with the spanning vectors in a par- 
ticular order, which depends on 
the order in which we took our 
variables in R*. 


We hope that this discussion 
convinces you that Definition 
6.3.4 corresponds to the heuristic 
description of how the integral of 
a form should work. 
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Integrating form fields over parametrized domains 


Before we can integrate form fields over oriented domains, we must define the 
orientation of domains; we will do this in Section 6.5. Here, as an introduction, 
we will show how to integrate form fields over domains that come naturally 
equipped with orientation-preserving parametrizations: parametrized domains. 

A parametrized ^-dimensional domain in R n is the image 7(A) of a C 1 
mapping 7 that goes from a pavable subset A of R fc to R n . Such a domain 
7 (A) may well not be a smooth manifold; a mapping 7 always parametrizes 
something or other in R n , but 7(A) may have horrible singularities (although 
it is more likely to be mainly a fc-dimensional manifold with some bad points). 
If we had to assign orientation to 7(A) this would be a problem; we will see in 
Section 6.5 how to assign orientation to a manifold, but we don’t know how to 
assign orientation to something that is “mainly a A:-dimensional manifold with 
some bad points.” 

Fortunately, for our purposes here it doesn’t matter how nasty the image is. 
We don’t need to know what 7(A) looks like, and we don’t have to determine 
its orientation. We are not thinking of 7(A) in its own right, but as “the result 
of 7 acting on A.” A parametrization by a mapping 7 automatically carries an 
orientation: 7 maps an oriented ^-parallelogram P°(vj, . . . , v*) to a curvilinear 

parallelogram that can be approximated by P° (x) (£>i7(x), . . . £>*7(x)); the or- 
der of the vectors in this ^-parallelogram depends on the order of the variables 
in R*. To the extent that 7(A) has an orientation, it is oriented by this order 
of vectors. 

The image 7(A) comes with a natural decomposition into little pieces: take 
some N , and decompose 7(A) into the little pieces 7(CnA), where C eV N ( M n ). 
Such a piece 7 (C n A) is naturally well approximated by a A>parallelogram: if 
u € is the lower left-hand corner of C, the parallelogram 



is the image of C by the linear approximation 

w 7(11) + [D7 (u)](w - u) to 7 at u. 6.3.4 

So if <p is a fc-form field on R n (or at least on a neighborhood of 7(A)), an 
approximation to 


should be 



6.3.5 


f ON ^ 1 'y( U )> • • • 1 «77^*7 (u)^ 

ceOf,(»") * / 

AnC*<2) 

6 3 6 

= volfc(C) £ <p(P^ {u} (D^(u),...,^(u))). 

cez> N («") ' 

Anc?<2) 
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Since k = 1, in the first line of 
Equation 6.3.9, we have the single 
— Rsinu 


vector 


Rcosu 


rather than the 


Di7(u), . . . , Dk'riu) of Definition 
6.3.4. 


In the second line of Equation 
6.3.9, the first Rcosu is x, while 
the second is the number given 
by dy evaluated on the parallelo- 
gram. Similarly, Rsinu is y , and 
(-R sin u) is the number given by 
dx evaluated on the parallelogram. 

Remember that y? is the inte- 
grand on the left side of Equation 
6.3.8, just as |d fc u| is the integrand 
on the right. We use <p to avoid 
writing 

^2 a,i i k (x)dx il /\..Adx tk . 

i<*i <■■■<»*<« 


Why must we choose the pos- 
itive orientation? The interval 
[a, 0] is a subset of R, so the ori- 
entation is determined by the ori- 
entation of R. The standard ori- 
entation of R is from negative to 
positive. 

Similarly, R 2 and R 3 also have 
a standard orientation: that in 
which the standard basis vectors 
are written in the normal way, 
giving det(©i,e 2 ) = 1 > 0 and 
det[ei,e 2 ,e 3 ] = 1 > 0. For R 3 , 
this is equivalent to the right-hand 
rule (see Proposition 1.4.20). This 
will be discussed further in Section 
6.5. 


But this last sum is a Riemann sum for the integral 

f ¥>(^(u)(£ ) i7(u),..-,D*7(u)))|<i (: u|. 6.3. 

J A 

To be rigorous, we define <P to be the above integral: 

Definition 6.3.4 (Integrating a k-form field over a parametrized 
domain). Let A C R* be a payable set and 7 : A — ► R n be a C 1 mapping. 
Then the integral of the lb-form field <p over 7 (A) is 

L^ m L *(*•« ^ 7(u) ’ • • • - j*rf u ») u - 8 

This is a function of u. 


Example 6.3.5 (Integrating a 1-form field over a parametrized curve). 
Consider a case where k = l,n = 2. We will use 7 (u) = ( /?sinu ) 
take A to be the interval [0, a], for some a > 0. If we integrate the 1-form field 
xdy-ydx over 7 (A) using the above definition, we find 


U xdy - ydx)= ij xdy - ydx) [ ~rTu}j w 

= / (R cos uR cos u - (R sin u)(-R sin u))\du\ = I R 2 \du\ 

*'(0,aj •'jO, a] 

= f R 2 du = R 2 a. 

Jo 


6.3.9 


What would we have gotten if a < 0? Until the bottom line, everything is the 
same. But then we have to decide how to interpret [0, a]. Should we write 



or 



6.3.10 


We have to choose the second, because we are now integrating over an oriented 
interval, and we must choose the positive orientation. So the answer is still 
R 2 a , which is now negative. A 


Example 6.3.6 (Another parametrized curve). In Example 6.3.5, you 
probably saw that 7 was parametrizing an arc of circle. To carry out the sort 
of computation we are discussing, the image need not be a smooth curve. For 
that matter, we don’t need to have any idea what 7(A) looks like. 
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Take for instance 7 (f) = (aictant)’ set A = IM for some a > 0 and 
ip — x dy. Then 

/ -\ 


I •?= xdy 
J-j(A) J[ 0,a] 


\ 


’( 1+<2 ) 

\ arctan t J 


2t 

1/(1 + t 2 ) 


\dt\ 


) 


6.3.11 


-L 


a 1 +t 2 
0 1 


\dt\ = a. A 


In Equation 6.3.13, 0 < s, t < 1 
means 

0 < s < 1 and 0 < t < 1. 


Example 6.3.7 (Integrating a 2-form field over a parametrized surface 

in R 3 ). Let us compute 


/ dx A dy + ydx Adz 
J'r(C) 


6.3.12 


h(C) 

over the parametrized domain 7 (C) where 




6.3.13 


Applying Definition 6.3.4, we find 
I dx A dy 4- ydx A dz 


( 


(?) 


1 

2s 

0 


= / / (dx A dy + ydx A dz) 

Jo Jo 

= /7 (-2s 4- s 2 (2t))|dsdt) 


1 
0 
2 1 


\ 


) 


\dsdt\ 


6.3.14 


t 2 

= '- t+ 3 


= 3 A 


6.4 Forms and Vector Calculus 

The real difficulty with forms is imagining what they are. What “is” dx 1 A 
dx 2 + dx 3 A dx 4 ? We have seen that it is the function that takes two vectors 
in R 4 , projects them first onto the (xi , £ 2 )- plane and takes the signed area of 
the resulting parallelogram, then projects them onto the ( 23 , 2 4 )-plane, takes 
the signed area of that parallelogram, and finally adds the two signed volumes. 
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Acquiring an intuitive under- 
standing of what sort of informa- 
tion a fc-form encodes really is dif- 
ficult. In some sense, a first course 
in electromagnetism is largely a 
matter of understanding what sort 
of beast the electromagnetic field 
is, namely a 2-form field on R 4 . 


But that description is extremely convoluted, and although it isn’t too hard to 
use it in computations, it hardly expresses understanding. 

However, in R 3 , it really is possible to visualize all forms and form fields, 
because they can be described in terms of functions and vector fields. There 
are four kinds of forms on R 3 : 0-forms, 1-forms, 2-forms, and 3- forms. Each 
has its own personality. 

O-form fields. In R 3 and in any R n , a 0-form is simply a number, and a 0-form 
field is simply a function. If / is a function on an open subset U C R n and 
/ : U — ► R is a function, then the rule f{P%) = /(x) makes / into a 0-form 
field. The requirement of antisymmetry then says that f(-P£) — — /(x). 


1-form fields. Let F be a vector field on an open subset [/ Cl". We can 
then associate to Fa 1-form field Wp, which we call the work form field : 


The 1-form field xdxi +ydy + 
zdz is the work form field of the 


x 

V 

z 


, the 


vector field F [ y | = 
z 

radial vector field shown (in R 2 ) 
in Figure 1.1.6: 

(xdx + ydy + zdz)(Px(v)^ = 
x • det t/i + y • det V 2 + x ■ det V 3 



X 


V 

= 

y 

• 

V2 


z 

m m 


. u 3. 


Definition 6.4.1 (Work form field). The work form field Wp of a vector 

rF,”| 

is the 1-form field defined by 


field F = 


LFj 


= Ax) • V. 


6.4.1 


This can also be written in coordinates: the work form field Wp of a vector 


field F = 


’iV 

-F„_ 


is the 1-form field F\dx\ H h F n dx n . Indeed, 


(Fidxi + • • • + F n dx n )(P2(Y)) = (Fi(x)dxi + • • • + F n (x)dx n ) 


vi 


Lv„J 


= Fi (x)vj H 1- F n (x)v n = F(x) • v. 


In this form, it is clear from Theorem 6.2.7 that every 1-form on U is the work 
of some vector field. 

What have we gained by saying that that a 1-form field is the work form field 
of a vector field? Mainly that it is quite easy to visualize Wp and to understand 
what it measures: if F is a force field, its work form field associates to a little 
line segment the work that the force field does along the line segment. To really 
understand this you need a little bit of physics, but even without it you can see 
what it means. Suppose for instance that F is the force field of gravity. In the 
absence of friction, it requires no work to push a wagon of mass m horizontally 

from a to b; the vector b — a and the constant vector field representing gravity 
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In Equation 6.4.2, g represents 
the acceleration of gravity at the 
surface of the earth, and m is 
mass; — gm is weight, a force; it is 
negative because it points down. 

The unit of a force field such as 
the gravitation field is energy per 
length. It’s the per length that 
tells us that the integrand to as- 
sociate to a force field should be 
something to be integrated over 
curves. Since direction makes a 
difference — it takes work to push 
a wagon uphill but the wagon rolls 
down by itself — the appropriate 
integrand is a 1-form field, which 
is integrated over oriented curves. 


are orthogonal 


to each other, with dot product zero: 


0 ' 


61 — d\ 

0 

• 

b 2 — 0-2 

-gm 


0 


6.4.2 


But if the wagon rolls down an inclined plane, the force field of gravity does 
“work” on the wagon equal to the dot product of gravity and the displacement 
vector of the wagon: 


0 

0 

-gm 


bi — oi 
&2 — 0>2 
bz 03 


= -gm(b 3 - a 3 ), 


6.4.3 


which is positive, since 63 — a 3 is negative. If you want to push the wagon back 
up the inclined plane, you will need to furnish the work, and the force field of 
gravity will do negative work. 

For what vector field F can the 1-form field £2 dxi 4- £2X4 + x \ dx* he 

written asWp? 4 


The $ in the flux form field 
is of course unrelated to the 
of the “concrete to abstract” 
function 4>{v} introduced in Sec- 
tion 2.6. 


It may be easier to remember 
the coordinate definition oi if 
it is written 


2-forms. If F is a vector field on an open subset U C M 3 , then we can associate 
to it a 2-form field on U called its flux form field $p, which we first saw in 
Example 6.1.1. 

Definition 6.4.2 (Flux form field). The flux form field $>p is the 2-form 
field defined by 

MW>w)) =det[#(x), w]. 6.4.4 


FidyAdz + F 2 dzAdx -f F^dxAdy 

(changing the order and sign of 
the middle term). Then (for x — 
1, y = 2 ,z — 3) you can think 
that the first term goes (1,2,3), 
the second (2,3,1), and the third 
(3,1,2). For instance, 



xdy A dz+ y dz A dx -f z dx A dy. 


In coordinates, this becomes 4> p = F\ dy A dz - F2 dx A dz + F 3 dx A dy: 


(F\ dy Adz - F 2 dx Adz + F 3 dx A dy) P° 


V\ 

V 2 

t>3 



= Fi(x)(v2W 3 - V3W2) - F 2 (x)(ViWz - V3W1) + F 3 (x)(vittf2 - V 2 Wi) 
= det[F(x), V, w]. 


6.4.5 


In this form, it is clear, again from Theorem 6.2.7, that all 2-form fields on M 3 
are flux form fields of a vector field: the flux form field is a linear combination of 
all the elementary 2- forms on R 3 , so it is just a question of using the coefficients 
of the elementary forms to make a vector field. 


f X\ \ 


r ^ 

X2 

X2 


X2X4 

£3 


0 

1*4,/ 


_2 

L *1 J 


4 It is the work form field of the vector field P 
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If a vector field represents the 
flow of a fluid, what units will 
it have? Clearly the vector field 
measures how much fluid flows 
through a unit surface perpendic- 
ular to the direction of flow in 
unit time: the units should be 
mass/ (length 2 ). The length 2 in 
the denominator tips us off that 
the appropriate integrand to as- 
sociate to this vector field is a 2- 
form, or at least an integrand to 
be integrated over a surface. You 
might go one step further, and 
say it is a 3-form on space-time: 
the result of integrating it over a 
surface in space and an interval 
in time is the total mass flowing 
through that region of spacetime. 
In general, any n - I-form field in 
K n can be considered a flux form 
field. 


Recall that p is the Greek letter 
“rho." 

The 3-form dx A dy A dz is an- 
other name for the determinant: 

dxAdy A dz{v \ , V2, V3) 

= det[vi, V 2 , v 3 ]. 


The characteristic of functions 
which really should be considered 
as densities is that they have units 
something /cubic length , such as 
ordinary density (kg/m 3 ) or 
charge density (c.oulombs/m 3 ). 


Once more, what we have gained is an ability to visualize, as suggested by 
Figure 6.4.1: the flux form field of a vector field associates to a parallelogram 
the flow of the vector field through the parallelogram. 



/ 

/ 

/ 

FIGURE 6.4.1, The flow of F through a surface depends on the angle between F 
and the surface. Left: F is orthogonal to the surface, providing maximum flow. This 
corresponds to F(x) being perpendicular to the parallelogram spanned by V, w. (The 
volume of the parallelepiped is det[f\ v.w] = F • (v x w), which is greatest when 
the angle 0 between P and v x w is 0, since if • y = |ifl||y| cos#.) Middle: F is not 
orthogonal to the surface, allowing less flow. Right: F is parallel to the surface; the 
flow is 0. In this case P^{P(x),^, w) is flat. This corresponds toF-(vxw) = 0, i.e., 
F is perpendicular to v x w and therefore parallel to the parallelogram spanned by v 
and w. 

If F is the velocity vector field of a fluid, the integral of its flux form field 

over a surface measures the amount of fluid flowing through the surface. Indeed, 

the fluid which flows through the parallelogram FJ(v,w) in unit time will fill 

the parallelepiped P°(F(x), v, w): the particle which at time 0 was at the 

corner x is now at x + F(x). The sign is positive if F is on the same side of 

— « 

the parallelogram as v x w, otherwise negative (and 0 if F is parallel to the 
parallelogram; indeed, nothing flows through it then). 

3-forms. Any 3-form on an open subset of R 3 is the 3-form dx A dy A dz (alias 
the determinant) multiplied by a function: we will denote by pj the 3-form 
fdx A dy Adz, and call it the density of /. 

Definition 6.4.3 (Density form of a function). Let V C l 3 be open. 
The density form p/ of a function / : U — * R is the 3-form defined by 

i»? 2 ,v 3 )) =/(x) det[vi,V 2 ,v 3 ] . 6.4.6 

density signed volume of P 
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To keep straight the order of 
the elementary 2-forms in Equa- 
tion 6.4.8, think that the first one 
omits dx, the second omits dy and 
the third omits dz. 

Recall that the work form field 
Wp of a vector field F is the 1- 
form field 

lV/(f?(v)) = F(x) • P. 


Note that the vectors v, vi, v 2 , 
and V3 of the definitions for the 
work form, flux form, and den- 
sity form, are replaced in the in- 
tegration formulas by derivatives 
of the parametrizations: i.e., by 

7'(0> and [D7(u)]. 


Summary: work, flux, and density forms on M 3 


Let / be a function on B£ 3 and F ~ 


Fi 

F 2 

L*J 


be a vector field. Then 


Wf = F\dx + F 2 dy -I- F 3 dz 
$ p = F\dy A dz - F 2 dx Adz + F 3 dx A dy 
pf = fdx Ady A dz. 


6.4.7 

6.4.8 

6.4.9 


Integrating work, flux and density form fields over parametrized 
domains 

Now let us translate Definition 6.3.4 (integrating a fc-form field over a parame- 
trized domain) into the language of vector calculus. 


Example 6.4.4 (Integrating a work form field over a parametrized 
curve). When integrating the work form field over a parametrized curve 
7 (A) = C, the equation of Definition 6.3.4: 

j * = J A ¥>(*&., (pTtOO, ■ • • , A7(u)))|^u| (6.3.8) 

becomes 

[ V = f = f /(•,(«))• f(u)|«(u| 6.4.10 

?'(»> 

This integral measures the work of the force field F along the curve. A 


Example 6.4.5 (Integrating a work form field over a helix). What is 
the work of the vector field 


F 




over the helix parametrized by 


6.4.11 



cost 
sin t 
t 

L 


By Equation 6.4.10 this is 


0 < t < 47T? 


6.4.12 


f 4 * 

sin t 


— sin t 

a4k 

f 

- cos t 

• 

cos t 

dt- (- sin 2 1 - cos 2 1) dt — -4tt. A 6.4.13 

/o 

0 


1 

1* - 

Jo 
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The flux form field of a 
vector field F is the 2-form field 

^(Wi,v 2 )) = det(F(x), v, w]. 

If F is the velocity vector field of 
a fluid, the integral of a flux form 
field measures the amount of fluid 
flowing through a surface. 


Example 6.4.6 (Integrating a flux form field over a parametrized sur- 
face). Let U be a subset of l 2 , 7 : U — ► IR 3 be a parametrized domain, and F 
a vector field defined on a neighborhood of S. Then 



/ $ r(^°uiPi7(u),r»27(u)))|d 2 u| 

JU 

[ det[F(7(u)),Di7,D 2 7] M 2 u|- 


6.4.14 


— # 

If F is the velocity vector field of a fluid, this integral measures the amount of 
fluid flowing through the surface S. A 


Example 6.4.7. The flux of the vector field 
parametrized domain 



through the 




0 

u 

2v 

\ 


u=0 


“ 2 \ 

uv , 0 < u, v < 1 is 

»v 


dudv — 



- 4 u 3 v 3 + 2 u 2 v 2 ) du dv 



6.4.15 


The density form field /?/ is the 
3-form 

P/(F?(v,,v 2 ,v 3 )) 

= /(x) det[vj, v 2 , v 3 |. 

In coordinates, p/ is written 
f(x)dx A dy A dz. 


Example 6.4.8 (Integrating a density form field over a parametrized 
piece of IR 3 ). Let U,V C K 3 be open sets, and 7 : U — ► V be a C 1 mapping. 
If / : V — > R is a function then 



J v p f (^7(u) (^i7(u), D 2 7(u), Z? 3 7(u))^ |d 3 u| 
Jv /(7(u)) det[D7(u)J |d 3 u|. 


6.4.16 


There is a particularly important special case of such a mapping 7 : U -> V: 
the case where V = U and 7(x) = x is the identity. In that case, the formula 
for integrating a density form field becomes 


/ Pf = f /(u)M 3 u|, 6.4.17 

Ji(V) Jv 

i.e., the integral of p f is simply what we had called the integral of / in Section 

4.2. If / is the density of some object, then this integral measures its mass 
A 
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Example 6.4.9 (Integrating a density form). Let / be the function 




2 . 2 
= x + y ■, 


6 . 4.18 


Figure 6.4.2. 

This torus was discussed in Ex- 
ample 5.4.2. This time we are in- 
terested in the solid torus. 


and for r < R, let T t ,r be the torus obtained by rotating the circle of radius r 

centered at in the ( 2 , 2 )- plane around the 2 -axis, shown in Figure 6.4.2. 

Compute the integral of p/ over the region bounded by TV, it (i*©*» fbe inside 
of the torus). Here, using the identity parametrization would lead to quite a 
clumsy integral. The following parametrization, with 0 < ti < r, 0 < v,w < 27 r, 
is better adapted: 

(R + it cost;) cos uA 
= [ (R + ucosv)s\nw ] . 
it sin v 



6.4.19 


) 


The integral becomes 

*2tr /»2ir 


f2ir f2n rr 

II —(R + u cos v) 2 u(R + ucosv)dudvdw 
Jo Jo Jo 

/*2ir 

= 2tt / (R?u + 3 R 2 u 2 cos v + 3Ru 3 cos w +u 4 cos 3 v) du dv 
Jo 

= -2. £ cost) + 

= -tr J (2R 3 t 2 + 5 ^ 1 ) A 


6.4.20 


dv 


When computing integrals by 
hand, the choice of parametriza- 
tion can make a big difference in 
how hard it is. It’s always a good 
idea to choose a parametrization 
that reflects the symmetries of the 
problem. Here the torus is sym- 
metrical around the 2 -axis; Equa- 
tion 6.4.19 reflects that symmetry. 


You might wonder whether this has anything to do with the integral we would 
have obtained if we had used the identity parametrization. A priori, it doesn’t, 
but actually if you look carefully, you will see that there is a computation 
of det[£> 7 ], and therefore that the change of variables formula might well say 
that the integrals are equal, and this is true. But the absolute value that 
appears in the change of variables formula isn’t present here (or needed, since 
the determinant is positive). Really figuring out whether the absolute value is 
needed will be a lengthy story, involving a precise definition of orientation. 

Work, flux and density in R n 

In all dimensions, 


(1) 0-form fields are functions. 

(2) Every 1-form field is the work form field of a vector field. 

(3) Every (n - l)-form field is the flux form field of a vector field 

(4) Every n-form is the density form field of a function. 
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We’ve already seen this for 0-form fields and 1-forrn fields. In IR 3 , the flux 
form field is of course a 2 = (n - l)-form field; its definition can be generalised: 


Exercise 6.4.8 asks you to verify 
that det[F(x),Vi, . . . ,v„_i] is an 
(n - l)-form field. 


In the first term, where i = 1, 
we omit dx t; in the second term, 
where i = 2, we omit dx 2 , and so 
on. 


Definition 6.4.10 (Flux form field on R"). If U C R n is an open subset 
and / is a vector field on U, then the flux form field ip is the (n - l)-form 
field defined by the formula 


• • • . v„-i) = det[f'(x),vi, . . • , Vn-i] 
In coordinates, this becomes 

n 

1) ,— 1 Fidx\ A • • • A dxi A • • • A dx n 


1=1 


\n — 1 


where the term under the hat is omitted. 

For instance, the flux of the radial vector field F 


■ • • i v n _ i] . 


'n 


A dxn 1 . . 

■ 

it 


f Xl \ 

'Xi * 

• 

\x n J 

-X n . 


6.4.21 


6.4.22 


is 


= (x\di 2 ^ ’Adx n )-(x 2 dxi Adx 3 A-- Adx n )H ±x n dx i A- • ■ Adx n _i, 

6.4.23 

where the last term is positive if n is odd, and negative if it is even. 

In any dimension n, n-form fields are multiples of the determinant, so all 
n-form fields are densities of functions: 


In dimensions higher than IR. 3 , 
some form fields cannot be ex- 
pressed in terms of vector fields 
and functions: in particular, 2- 
forms on IR 4 , which are of great in- 
terest in physics, since the electro- 
magnetic field is such a 2-form on 
spacetime. The language of vector 
calculus is not suited to describing 
integrands over surfaces in higher 
dimensions, while the language of 
forms is. 


Definition 6.4.11 (Density form field on lR n ). Let U C E n be open. 
The density form field pj of a function f :U — ► IR is given by 

Pf = fdx 1 A • • • A dx n . 


The correspondences between form fields, functions and vectors, summarized 
in Table 6.4.3, explain why vector calculus works in IR 3 — and why it doesn’t 
work in higher dimensions than 3. For fc-forms on R n , when k is anything 
other than 0, 1, n - 1, or n, there is no interpretation of form fields in terms of 
functions or vector fields. 

A particularly important example is the electromagnetic field, which is a 
6-component object, and thus cannot be represented either as a function (a 
1-component object) or a vector field (in IR 4 , a 4-component object). 

The standard way of dealing with the problem is to choose coordinates 
x,y,z,t, in particular choosing a specific space-like subspace and a specific 
time-like subspace, quite likely those of your laboratory. Experiment indicates 
the following force law: there are two vector fields, E (the electric field) and 
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The “c” in Equations 6.4.25 
and 6.4.25 represents the speed of 
light. It is necessary to put it in 
so that Wg A cdt and have 
the same units: force/(charge x 
length 2 ). We are using the cgs 
system of units (centimeter, gram, 
second). In this system the unit of 
charge is the statcoulomb , which is 
designed to remove constants from 
Coulomb’s law. (The mfes system, 
based on meters, kilograms and 
seconds, uses a different unit of 
charge, the coulomb , which results 
in constants //q and (a that clutter 
up the equations). We could go 
one step further and use the math- 
ematicians’ privilege of choosing 
units arbitrarily, setting c = 1 , but 
that offends intuition. 


Exercise 6.8.8 asks you to use 
form fields to write Maxwell’s 
laws. 


B (the magnetic field), with the property that a charge q at (x, t) and with 
velocity v (in the laboratory coordinates) is subject to the force 

q(E(x) + - x B(x)). 6.4.24 

0 

But E and B are not really vector fields. A true vector field keeps its individ- 
uality when you change coordinates. In particular, if a vector field is 6 in one 
coordinate system, it will be 0 in every coordinate system. This is not true of 
the electric and magnetic fields. If in one coordinate system the charge is at 
rest and the electric field is 6, then the particle will not be accelerated in those 
coordinates. In another system moving at constant velocity with respect to the 
first (on a train rolling through the laboratory, for instance) it will still not be 
accelerated. But it now feels a force from the magnetic field, which must be 
compensated for by an electric field, which cannot now be zero. 

Is there something natural that the electric field and the magnetic field to- 
gether represent? The answer is yes: there is a 2-form field on 3£ 4 , namely 

E x dx A cdt + Eydy A cdt+E^dz A cdt -f B x dy A dz 4- B y dz A dx 4- B z dx A dy 

=W$Acdt + $ § . 6.4.25 

This 2-form field, which the distinguished physicists Charles Misner, Kip 
Thorne, and J. Archibald Wheeler call the Faraday (in their book Gravita- 
tion, the bible of general relativity), is really a natural object, the same in 
every inertial frame. Thus form fields are really the natural language in which 
to write Maxwell’s equations. 


Form Fields 

Vector Calculus 


R 3 

W 1 

0- form field 

1- form field 

(n - 2)-form field 
(n - I)-form field 

n-form field 

Function 

Vector field (via work form field) 
Same as 1-form 

Vector field (via flux form field) 
Function (via density form field) 

Function 

Vector field 

No Equivalent 

Vector field 

Function 


Figure 6.4.3. Correspondence between forms and vector calculus. In all dimensions, 
0-form fields, 1-form fields, (n — l)-form fields, and n-form fields can be identified to 
a vector field or a function. Other form fields have no equivalence in vector calculus. 
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We found this argument by work- 
ing through the proof of the state- 
ment that integrals of manifolds 
with respect to |d fc x| are indepen- 
dent of parametrization (Proposi- 
tion 5.5.1, proved for surfaces in 
Equation 5.4.16) and noting the 
differences. You may find it in- 
structive to compare the two ar- 
guments. Superficially, the equa- 
tions may seem very different, but 
note the similarities. The first line 
of Equation 5.4.16 corresponds to 
the right-hand side of the first line 
of Equation 6.5.1; in both we have 

f (— )|d*v|. 

Jv 

In the second lines of both equa- 
tions, we have 

/ (— ) |det[Dft(u)]|. 

Jv 


6.5 Orientation and Integration of Form Fields 

“ . . . the great thing in this world ts not so much where we stand, as 
in what direction we are moving . ” -Oliver Wendell Holmes 


Compatible orientations of parametrized manifolds 


We have discussed how to integrate A:-form fields over fc-diinensional parame- 
trized domains. We have seen that where integrands like |d*x| are concerned, 
the integral does not depend on the parametrization. Is this still true for form 
fields? The answer is “not quite”: for two parametrizations to give the same 
result, they have to induce the same orientation on the image. 

Let us see this by trying to prove the (false) statement that the integral does 
not depend on the parametrization, and discovering where we go wrong. Let 
M Cl" be a A:-dimensional manifold, U,V be subsets of R k , and 7i : U - 
Af, 72 : V — » M be two parametrizations, each inducing its own orientation. 
Let p be a fc-form on a neighborhood of M. 

Define as in Theorem 5.2.8 the “change of parameters” map 0 = 7^"* o ^ ; 

JJ°k yok 

Then Definition 6.3.4 (integrating a fc-form field over a parametrized domain) 
and the change of variables formula, give 

f v = j v V ? ( / 72(v)(^i72(v),...,D J t72(v)))|d ;c v| 6.5.1 

= lu t*°*(u)(^i 72($( u )), • . • ,D*7 2 ($(u))) |det[D$(u)]j|d*u|. 

We want to express everything in terms of 7x. There is no trouble with the 
point (72 o $)(u) = 7i(u) where the parallelogram is anchored, but the vectors 
which span it are more troublesome, and will require the following lemma. 


Lemma 6.5.1. If wj, . . . , w* are any k vectors in M k , then 

V ? (- P 72(v)(L>i72(v), . . . , Djt7 2 (v))) det[wi, . . . , w fc ] 

= . . , (D7 2 (v)]w*)) 


6.5.2 


Proof. Since the vectors [D72 (v)]wi, . . . , [D 7 2 (v)jw^ in the second line of 
Equation 6.5.2 depend on wi , . . . , w k , we can consider the entire right-hand side 
of that line as a function of v and Wj, . . . , w*, multilinear and antisymmetric 
with respect to the w. The latter are k vectors in JR fc , so the right-hand side 
can be written as a multiple of the determinant: a(v) det[wi, . . . , w*] for some 
function a(v). 
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To find a(v), we set wi, . . . , w* = ei, . . . ,e*. Since [D7 2 (v)]ei — £>t72(v), 
substituting ei, . . . , e* for Wi, . . . , w* in the second line of Equation 6.5.2 gives 

v( p *»(v)(p72(v)]ei l . . • , [D7 2 (v)]e fc )) - V>(^2 (v)(^i72(v), . . . , D k y 2 (v))) 

- a(v)det[ei, . . .e*] = o(v). 6.5.3 


So 

V J (^7a(v)(p72(v)]^i,. . . , [D7 2 (v)]w fc )) = a(v)det[w!, . . . , w*] 

- ^(^y 2 (v)(^l72(v), . . • , Z^fc72(v))) det[wi, . . . , Wife]. □ 


6.5.4 


Now we write down the function being integrated on the second line of Equar 
tion 6.5.1, except that we take det[D4>(u)J out of absolute value signs, so that 
we will be able to apply Lemma 6.5.1 to go from the second to the third line: 


The first line of Equation 6.5.5 
is the function being integrated on 
the second line of Equation 6.5.1: 
everything between the J v and the 
|d*u|, with the important differ- 
ence that here the det[D$(u)| is 
not between absolute value signs. 

The second line is identical to 
the first, except that det[D4>(u)j 
has been rewritten in terms of the 
partial derivatives. 


( p(^P'y2oQ(\i) (Da 2 (*(u)), .... D k 7 2 (*(u))) ) det[D4>(u)| 

= ¥’(fV a o*(u)(£>il'2(*(u)), • • -,D k I 2 ($(u)))) det [£>i$(u), . . ■ , £>t$(u) j 

= ¥>(^.*(a)((D72(*(u))](^i(u)) [D72 (*(u))](WKu)))) 

V 

= rfawpmfu), . . • , £>*7 i(u)))- 6.5.5 

To pass from the second to the third line of Equation 6.5.5 we use Lemma 
6.5.1, setting w ; = Dj$(u) and v = $(u). (We have marked some of these 
correspondences with underbraces.) We use the chain rule to go from the third 
to the fourth line. 

Now we come to the key point. The second line of Equation 6.5.1 has 
|det[D$(u)]|, while the first line of Equation 6.5.5 has det[D$(u)]. Therefore 
the integral 


f ^( P 7x(u)Pi7i(u),...,£)fc7i(u)) |d*u| 6.5.6 

Ju 

obtained using 71 and the integral 

j V{ p V*(v)(^i72(v), . . . , £>* 72 (v)) |d*v| 6.5.7 


obtained using 7 2 will be the same only if |det[D$(u)]| = det[D$(u)]. That 
is, they will be identical if det[D$] > 0 for all u € 17, and otherwise probably 
not. If det[D$] < 0 for all u € U then 


L 




V = ~ 


/ v 


6.5.8 
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If det(D$(u)| is positive in some regions of U and negative in others, then the 
integrals are probably unrelated. 

If det[D$j > 0, we say that the two parametrizations of M induce compatible 
orientations of M. 

In Definition 6.5.2, recall that 

when 71 goes from U to M, and Definition 6.5.2 (Compatible orientation). Let 71 and 72 be two 
72 from V to M , ^ = 7 2 1 0 7i «s parametrizations, with the “change of parameters” map $ = 7J 1 ° 7i- T** e 

only defined on U two parametrizations 71 and 72 are compatible if det[D$) > 0. 


This leads to the following theorem. 


Theorem 6.5.3 (Integral independent of compatible parametriza- 
tions). Let M C R n be a k-dimension&l oriented manifold } U, V open 
subsets ofR k , and 71 : U — ► R n and 72 : V -» M n be two parametrizations of 
M that induce compatible orientations of M. Then for any k-form <p defined 
on a neighborhood of M , 



6.5.9 


Orientation of manifolds 

When using a parainetrization to integrate a fc-form field over an oriented 
domain, clearly we must take into account the orientation induced by the 
parametrization. We would like to be able to relate this to some character- 
istic of the domain of integration itself. What kind of structure can we bestow 
on an oriented curve, surface, or higher-dimensional manifold that would enable 
us to decide how to check whether a parametrization is appropriate? 

There are two ways to approach the somewhat challenging topic of orien- 
tation. One is the ad hoc approach : to limit the discussion to points, curves, 
surfaces, and three-dimensional objects. This has the advantage of being more 
concrete, and the disadvantage that the various definitions appear to have noth- 
ing to do with each other. The other is the unified approach : to discuss orien- 
tation of fc-dimensional manifolds, showing how orientation of points, curves, 
surfaces, etc., are embodiments of a general definition. This has the disadvan- 
tage of being abstract. We will present the ad hoc approach first, followed by 
the unified theory. 

The ad hoc world: orienting the objects 

We will treat orientations of the objects first, followed by orientation-preserving 
parametrizations. 
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Definition 6.5.4 (Orientation of a point). An orientation of a point is 
a choice of ± : an oriented point is “plus the point” or “minus the point.” 



It is easy to understand orientations of curves (in any R n ): give a direction 
to go along the curve. The following definition is a more formal way of saying 
the same thing; it is illustrated in Figure 6.5.1. By “unit tangent vector field” 
we mean a field of vectors tangent to the curve and of length 1 . 

Definition 6.5.5 (Orientation of a curve in R n ). An orientation of a 
curve C c R 3 is the choice of a unit tangent vector field T that depends 
continuously on x. 


Figure 6.5.1. 

A curve is oriented by the 
choice of a unit tangent vector 
held that depends continuously on 
x. We could give this curve the op- 
posite orientation by choosing tan- 
gent vectors pointing in the oppo- 
site direction. 


We orient a surface S C R 3 by choosing a normal vector at every point, as 
shown in Figure 6.5.2 and defined more formally below. 

Definition 6.5.6 (Orientation of a surface in R 3 ). To orient a surface 
in R 3 , choose a unit vector field N orthogonal to the surface. At each point 
x there are two vectors $(x); choose one at each point, so that the vector 
field N depends continuously on the point. 


This is possible for an orientable surface like a sphere or a torus: choose either 
the outer-pointing normal or the inward-pointing normal. But it is impossible 
on a Moebius strip. This definition does not extend at all easily to a surface in 
R 4 : at every point there is a whole normal plane, and choosing a normal vector 
field does not provide an orientation. 

Definition 6.5.7 (Orientation of open subsets of R 3 ). One orientation 
of an open subset X of R 3 is given by det; the opposite orientation is given 
by — det. The standard orientation is by det. 

We will use orientations to say whether three vectors Vi , V 2 , v 3 form a direct 
basis of R 3 ; with the standard orientation, Vj , v 2 , Vg being direct means that 
detjvi, v 2 , V 3 ] > 0. If we have drawn 61 , 02,03 in the standard way, so that 
they fit the right hand, then ^i,^ 2 ,v 3 will be direct precisely if those vectors 
also satisfy the right-hand rule. 

Figure 6.5.2. 



Td orient a surface, we choose 
a normal vector field that depends 
continuously on x. (Recall that 
“normal” means “orthogonal.”) 


The unified approach: orienting the objects 

All three notions of orientation are reasonably intuitive, but they do not appear 

to have anything in common. Signs of points, directions on curves, normals 

to surfaces, right hands: how can we make all four be examples of a single 
construction? 
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Recall (part (b) of Proposition 
1.4.20) that the determinant of 
three vectors is positive if they sat- 
isfy the right-hand rule, and neg- 
ative otherwise. 

Unlike the ad hoc definition of 
orientation, which does not work 
for a surface in IR 4 , the unified def- 
inition applies in all dimensions. 


We will see that orienting manifolds means orienting their tangent spaces, so 
before orienting manifolds we need to see how to orient vector spaces. We saw 
in Section 6.2 (Corollary 6.2.12) that for any Ar-dimensianal vector spat e E, the 
space A k (E) of fc-forms in E has dimension one. Now we will use this space 
to show that the different definitions of orientation we gave at the beginning of 
this section are all special cases of a general definition. 

Definition 6.5.8 (Orienting the space A k (E)). The one-dimensional 
space A k (E) is oriented by choosing a nonzero element w of A k (E). An 
element aw, with a > 0, gives the same orientation as w, while bw , with 
b < 0, gives the opposite orientation. 


Definition 6.5.9 (Orienting a finite-dimensional vector space). An 

orientation of a fc-dimensional vector space E is specified by a nonzero ele- 
ment of A k (E). Two nonzero elements specify the same orientation if one is 
a multiple of the other by a positive number. 

Definition 6.5.9 makes it clear that every finite-dimensional vector space (in 
particular every snbspace of 1R”) has two orientations. 


Equivalence of the ad hoc and the unified approaches for sub- 
spaces of R 3 

Let E C IR* be a line, oriented in the ad hoc sense by a nonzero vector v € E, 
and oriented in the unified sense by a nonzero element w e A l (E). Then these 
two orientations coincide precisely if u>(v) > 0. 

For instance, if E C IR 2 is the line of equation x -f y = 0, then the vector 

defines an ad hoc orientation, whereas dx provides a unified orientation. 


-1 


They do coincide: dx | ^ j = 1 > 0. The element of A l (E) corresponding to 

dy also defines an orientation of E, in fact the opposite orientation. Why does 
dx + dy not define an orientation of this line? 5 

Now suppose that Eel 3 is a plane, oriented “ad hoc” by a normal n, and 
oriented “unified” by w € A 2 (E). Then the orientations coincide if for any two 
vectors v t , v 2 € E , the number u/(vj, v 2 ) is a positive multiple of detjn, vj, v 2 ]. 
For instance, suppose E is the plane of equation x 4- y + 2 = 0, oriented “ad 

, and oriented “unified” by dx A dy. Any two vectors in E can be 


hoc” by 


5 Because any vector in E can be written | j , and (dx+dy) 

corresponds to 0 € A l (E). 


(-:H 


so dx + dy 
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written 


a 


c 

b 


d 

-a - b 


1 

c 

1 

1 

l 


so we have 

unified approach : 


dx A dy 



= ad — be. 


6 . 5.10 


0.5.11 


ad hoc approach : 


dot 


a c 

b d 

—a — b —c - d 


— 3 (ad — be). 


6.5.12 


These orientations coincide, since 3 > 0. What if we had chosen dy A dz or 
dx A dz as our nonzero element of A 2 (E)'? 6 

We see that in most cases the choice of orientation is arbitrary: the choice 
of one nonzero element of A k (E) will give one orientation, while the choice of 
another may well give the opposite orientation. But ER n itself and {0} (the 
zero subspace of E”), are exceptions; these two trivial subspaces of P." do have 
a standard orientation. For {0}, we have ^4°({0}) = IR, so one orientation is 
specified by +1, the other by — 1; the positive orientation is standard. The 
trivial subspace R n is oriented by uj — det; and det > 0 is standard. 


Orienting manifolds 

Most often we will be integrating a form over a curve, surface, or higher- 
dimensional manifold, not simply over a line, plane, or 1R 3 . A fc-manifold is 
oriented by orienting T X A/, the tangent space to the manifold at x, for each 
x e M: we orient the manifold M by choosing a nonzero element of A k (T x M). 

Definition 6.5.10 (Orientation of a ^-dimensional manifold). An 
orientation of a fc-dimensional manifold M C R n is an orientation of the 
tangent space T X M at every point x € M, so that the orientation varies con- 
tinuously with x. To orient the tangent space, we choose a nonzero element 
of A k {T x M). 


6 The first gives the same orientation as dx Ady, and the second gives the opposite 
orientation: evaluated on the vectors of Equation 6.5.10, which we’ll call Vi and v 2 . 
they give 


dy A dz(vi.v 2) - det 


b d 

—a - b —c — d 


-be - bd ad bd — ad - be. 


a c 

-a - b - c — d 


dx A dz(v 1 , v 2 ) = det 


= — ( ad - be). 
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Recall (Section 3.1) that the 
tangent space to a smooth curve, 
surface or manifold is the set of 
vectors tangent to the curve, sur- 
face or manifold, at the point of 
tangency. The tangent space to 
a curve C at x is denoted T X C 
and is one-dimensional; the tan- 
gent space to a surface S at x is de- 
noted T X S and is two-dimensional, 
and so on. 



Figure 6.5.3. 

Top: y(<) (the velocity vector 
of the parametrization) points in 
the same direction as the vector 
orienting the curve; the parame- 
trization 7 preserves orientation. 
Below: y(t) points in the opposite 
direction of the orientation; 7 is 
orientation reversing. 


Once again, we use a linearization (the tangent space) in order to deal with 
nonlinear objects (curves, surfaces, and higher-dimensional manifolds). 

What does it mean to say that the “orientation varies continuously with x”? 
This is best understood by considering a case where you cannot choose such an 
orientation, a Moebius strip. If you imagine yourself walking along the surface 
of a Moebius strip, planting a forest of normal vectors, one at each point, all 
pointing “up” (in the direction of your head), then when you get back to where 
yon started there will be vectors arbitrarily close to each other, pointing in 
opposite directions. 

The ad hoc world: when does a parametrization preserve 
orientation? 

We can now define what it means for a parametrization to preserve orientation. 
For a curve, this means that the parameter increases in the specified direction: 
a parametrization 7 : [a, 6 ) C preserves orientation if C is oriented from 7 (a) 
to 7 ( 5 ). The following definition spells this out; it is illustrated by Figure 6.5.3. 

Definition 6.5.11 (Orientation-preserving parametrization of a 
curve). Let C C M n be a curve oriented by the choice of unit tangent 
vector field T. Then the parametrization 7 : (o, 6 ) -*■ C is orientation pre- 
serving if at every t e (a, 6 ), we have 

Y (t) • T(y(t)) > 0. 6.5.13 

Equation 6.5.13 says that the velocity vector of the parametrization points 
in the same direction as the vector orienting the curve. Remember that 

• v 2 = (cos0)|vi| |v 2 |, 6.5.14 

where 0 is the angle between the two vectors. So the angle between y(t) and 
T( 7 (t)) is less than 90°. Since the angle must be either 0 or 180°, it is 0 . 

It is harder to understand what it means for a parametrization of an oriented 

surface to preserve orientation. In Definition 6.5.12, Diy(u) and D 2 7 (u) are 
two vectors tangent to the surface at y(u). 

Definition 6.5.12 (Orientation-preserving parametrization of a sur- 
face). Let 5 C K 3 be a surface oriented by a choice of normal vector field 
ft . Let U C R 2 be open and 7 : U — ► S be a parametrization. Then 7 is 
orientation preserving if at every u G U , 

det(Af W u)).A7(«).^ > 0. 


6.5.15 
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Definition 6.5.13 (Orientation-preserving parametrization of an 
open subset of IR 3 ). An open subset U of IR 3 carries a standard orien- 
tation, defined by the determinant. If V is another open subset of IR 3 , and 
7 . y u is a parametrization (i.e., a change of variables), then 7 is orien- 
tation preserving if det[D 7 (v)] > 0 for all v 6 V. 


The unified approach: when does a parametrization 
preserve orientation? 

First let us define what it means for a linear transformation to be orientation 
preserving. 


In Definition 6.5.14, IR* is ori- 
ented in the standard way, by det, 
and 

det(ei . . . ek) = 1 > 0. 

If the orientation of V by w also 
gives a positive number when ap- 
plied to T(e 1 ) . . . T(efc), then T is 
orientation preserving. 

Exercise 6.5.2 asks you to prove 
that if a linear transformation T is 
not one to one, then it is not ori- 
entation preserving or reversing. 

In Definition 6.5.15, the deriv- 
ative [D 7 (u)| is of course a lin- 
ear transformation; we use Defini- 
tion 6.5.14 to determine whether it 
preserves orientation. Since U C 
IR* is open, it is necessarily k- 
dimensional. 


Definition 6.5.14 (Orientation-preserving linear transformation). If 

V C IR m is a fc-dimensional subspace oriented by u> 6 A k (V), and T : IR* — * V 
is a linear transformation, T is orientation-preserving if 

u;(r(6i),...,r(e fc )) >0. 6.5.16 

It is orientation reversing if 

w(r(S 1 ) l ...,r(5 fc )) <0. 6.5.17 

Note that for a linear transformation to preserve orientation, the domain and 
the range must have the same dimension, and they must be oriented. 

As usual, faced with a nonlinear problem, we linearize it: a (nonlinear) 
parametrization of a manifold is orientation preserving if the derivative of the 
parametrization is orientation preserving. 

Definition 6.5.15 (Orientation-preserving parametrization of a 
manifold). Let M be an oriented ^-dimensional manifold, U C 1* be 
an open set, and 7 : U — ► M be a parametrization. Then 7 is orienta- 
tion preserving if [D 7 (u)J : IR* — ► T y ( u )M is orientation preserving for every 
u 6 I/, i.e., if 

w([D 7 (u)](ei), . . . , [D 7 (u)](e*)) = u^D^u), . . . , D k y(u)) > 0. 


Example 6.5.16 (Orientation-preserving parametrization). Consider 
the surface S in C 3 parametrized by 


z *-* 



, \z\ < 1 , 


6.5.18 
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In this parametrization we are 
writing the complex number z in 
terms of polar coordinates: 

z — r(cos0 + zsin0). 

(See Equation 0.6.10). 


/ zA /xi+iyi\ 

We will denote points in C 3 by I z 2 I = I x 2 + ij/2 I • 

\z z ) \X3 + iy3 ) 

Orient 5, using u = dx\ A dy \ . 

If we parametrize the surface by 

X\ — r cos 6 \ 
yi = r sin 
X2 = r 2 cos 20 
y 2 = r 2 sin 26 ' 

X3 = r 3 cos 3 6 
y 3 = r 3 sin 30 



does that parametrization preserve orientation? It does, since 

cos0 -rsin0 


dx 1 A dyi (£>17(11), D 2 7(11)) = rfxi A dj/i 


sin 0 r cos 0 


= det 


cos0 -rsin0 
sin 0 r cos 0 


= r cos 2 0 + r sin 2 0 = r > 0. A 


6.5.19 


6.5.20 


Exercise 6.5.4 asks you to show that our three ad hoc definitions of orienta- 
tion-preserving parametrizations are special cases of Definition 6.5.15. 


Compatibility of orientation-preserving parametrizations 


Recall (Definition 6.5.2) that 
two parametrizations 71 and 72 
with the “change of parameters” 
map — ^2 1 0 7i are compatible 
if det{D$) > 0. 


Theorem 6.5.3 said the result of integrating a k-form over an oriented manifold 
does not depend on the choice of parametrization, as long as the paxametriza- 
tions induce compatible orientations. Now we show that the integral is in- 
dependent of parametrization if the parametrization is orientation preserving. 
Most of the work was done in proving Theorem 6.5.3. The only thing we need 
to show is that two orientation-preserving parametrizations define compatible 
orientations. 

Theorem 6.5.17 (Orientation-preserving parametrizations define 
compatible orientations). If M is an oriented k-manifold t U\ and U 2 
are open subsets of R k , and 71 : U\ — ► M, 72 : U 2 —> M are orientation- 
preserving parametrizations , then they define compatible orientations. 


Proof. Consider two points ui € Ui,u 2 € U 2 such that 71 (ui) = 72(112) = 
x € M. The derivatives then give us maps 

lD ^ 01 t x M (d ^* 2)1 6.5.21 
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Remember that 

7i(uj) = 72(112) = x; 

the first line of Equation 6 . 5 . 22 , 
— 7i(ui); in the second line, 
= 72(112). 


where both derivatives are one to one linear transformations. Moreover, we 
have u>(x) ^ 0 in the one-dimensional vector space A k (T x M). What we must 
show is that if 

w(x)(D 1 7i(ui) t ...,Dfc7i(ui)) >0 and a/(x)(Di7 2 (u 2 ), £* 72(112)) > 0, 

then det([D7 2 (u 2 )])~ 1 [D7i(ui)| > 0. 

Note that 

w(x)([D7 1 (ui)J(v 1 ),...,[D7 1 (ui)](V fc )) = adet[vi, . . . , V*] 
a;(x)([D7 2 (u 2 )](wi), ...» [D7 2 (u 2 )](w*)) = /?det[wi, . . . , w*]. 

for some positive numbers a and /?. Indeed, both left-hand sides are nonzero 
elements of the one-dimensional vector space j 4*(R*), hence nonzero multiples 
of the determinant, and they return positive values if evaluated on the standard 
basis vectors. Now write 

q = o;(x)(Di7i(ui), . . . , Z>*7i(ui) = u;(x)([I>7i(ui)]ei, . . . , (D7i(ui)]e*) 

= <*'(x)([D7 2 (u 2 )] ([D7 2 (u 2 ))) _1 [D7 1 (ui )] ei,. . . , 

PTSW ([D72(u 2 )])-‘ [D 7 i(u,))S*) 

= /Jdet[([D7 2 (u 2 )l)- , [D7 1 (u 1 )]g 1 ,...,((D7 2 (u 2 )])-'[D7 1 (u,))e*) 

= 0det (([D7 2 (u 2 )]) _1 (D 7 i(u,)]) det[e a> ...,e fc ] 6.5.23 

= /9det(([D 72 (u 2 )])- 1 [D7 1 (u 1 )]). □ 

Corollary 6.5.18 (Integral independent of orientation-preserving 
parametrizations) . Let M be an oriented k-manifokl, U and V be open 
subsets o/R*, and 71 : U —* A/, 73 : V — ♦ M be orientation-preserving 
parametrizations of M . Then for any k-form (p defined on a neighborhood 
of M, we have 



Integrating form fields over oriented manifolds 

Now we know everything we need to know in order to integrate form fields over 
oriented manifolds. We saw in Section 5.4 how to integrate form fields over 
parametrized domains. Corollary 6.5.18 says that we can use the same formula 
to integrate over oriented manifolds, as long as we use an orientation-preserving 
parametrizations. This gives the following: 
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Definition 6.5.19 (Integral of a form field over an oriented man- 
ifold). Let M be a ^-dimensional oriented manifold, <p be a fc-form field 
on a neighborhood of A/, and 7 : U — * M be any orientation-preserving 
parametrization of M. Then 



This is an example of the first 
class of parametrizations listed in 
Section 5.2, parametrizations as 
graphs; see Equation 5.2.5. 


Example 6.5.20 (Integrating a flux form over an oriented surface). 

through the piece of the 


( x \ 

y~ 

!/ = 

-X 

V *) 

z 


What is the flux of the vector field F I y 1 = 


plane P defined by x + y + z— \ where x>y,z > 0, and which is oriented by 
1 

1 

? 


the normal 


This surface is the graph of z = 1 — x - y, so that 


7 


6.5.25 


(d-l ; 

\1 -x - y 

is a parametrization, if x and y are in the triangle T Cl 2 given by x, y > 0, x+ 
y < 1 . Moreover, this parametrization preserves orientation (see Definition 

6.5.12), since det(iV( 7 (u)), £>i 7 (u), D 2 7 (u)] is 


At right we check that 7 pre- 
serves orientation. 


det 



r 


r 


0 ‘ 



1 

> 

0 

* 

1 



1 


-1 


-1 



= 1>0. 


By Definition 6.4.6, the flux is 


6.5.26 


Now we compute the integral. 


Note that the formula for inte- 
grating a flux form over a surface 
in R 3 enahles us to transform an 
integral over a surface in R 3 into 
a integral over a piece of R 2 , as 
studied in Chapter 4. 


f ( 7(a)) 


D 17 027 


f f 

- 

y 


‘ r 


o' 

- 

/ = / det 


—X 

y 

0 


1 


Ip M Jr 

- 

1 - X - y 


-i_ 


-1 

. 

l *J 









\dxdy\ 


- 2x) \dxdy\ = j£ u (1 - 2x) dx^ dy 


6.5.27 


"' 4 - 


Example 6.5.21 (Integrating a 2-form field over a parametrized sur- 
face in C 3 = R 6 ). Consider again the surface 5 in C 3 of Example 6.5.16. 
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What is 


J dxi A dy\ 4 dx 2 A dy 2 4- dx 3 A dys 


6.5.28 


As in Example 6.5.16, parametrize the surface by 

( rcos ! \ 


7 


d) 


r sin 6 
r 2 cos 20 
r 2 sin 26 
r 3 cos 30 
\ r 3 sin 30 ) 


(6.5.19) 


which we know from that example preserves orientation. Then 

\ 


{dxi A dy\ 4 dx 2 A ^ 3/2 4- ^3 A dy 3 ) 




= det 


cos# -rsiii# 
sin 6 r cos 0 

= r 4 4r 3 4- 9r 5 . 


4 det 


2r cos 2d — 2r 2 sin 2d 
2 r sin 2d 2r 2 cos 2d 


4" det 


3r 2 cos3d ~3r 3 sin3d 
3r 2 sin 3d 3r 3 cos 3d 


Note that in both cases in Ex- 
ample 6.5.23 we are integrating 
over the same oriented point, x = 
42. We use curly brackets to 
avoid confusion between integrat- 
ing over the point 42 with neg- 
ative orientation, and integrating 
over the point - 2 . 


Finally, we find for our integral: 

• 1 


2n f (r 4 4r 3 4 9r 5 ) dr = 7n. A 

Jo 


6.5.29 


6.5.30 


For completeness, we show the case where if is a 0-form field: 

Example 6.5.22 (Integrating a 0-form over an oriented point). Let 
x be an oriented point, and / a function (i.e., a 0-form field) defined in some 
neighborhood of x. Then 


f f = +/(x) and f f = - 

J 4-X J-X 


Example 6.5.23 (Integrating over an oriented point). 


We need orientation of domains 
and their boundaries so that, we 
can integrate and differentiate 
forms, but orientation is impor- 
tant for other reasons. Homology 
theory, one of the big branches of 
algebraic topology, is an enormous 
abstraction of the constructions in 
our discussion of the unified ap- 
proach to orientation. 6.6 boundary Orientation 


/(x). A 


6.5.31 


L 


x 2 = 4 and 


+ {+ 2 } 


/ z 2 = -4. 
■M+ 2) 


6.5.32 


Stokes’s theorem, the generalization of the fundamental theorem of calculus, is 
all about comparing integrals over manifolds and integrals over their boundaries. 
Here we will define exactly what a “manifold with boundary” is; we will see 
moreover that if a “manifold with boundary” is oriented, its boundary carries 
a natural orientation, called, naturally enough, the boundary orientation. 



Moll 



Figure 6.6.1. 


This figure illustrates our defi- 
nition of a piece-with-bonndary of 
a manifold; above. M is a manifold 
and X a piece-wi th-boundary of 
that manifold. Locally, the man- 
ifold M is a graph; every point 
x € M has a neighborhood where 
M n U is the graph of a map- 
ping. The part of that neighbor- 
hood within X lies above the re- 
gion G t > 0 in the domain. 


In Definition 6.6.1, part (1) 
deals with the parts of the “piece- 
with-boundary” that are inside 
the piece; part (2) deals with the 
boundary. 

A diffeoniorphism is a differen- 
tiable mapping with differentiable 
inverse. 
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You may think of a “piece-with-boundary” of a ^-dimensional manifold as a 
piece one can carve out of the manifold, such that the boundary of the piece 
is part of it (the piece is thus closed). However, the boundary can' I have 
any arbitrary shape. In many treatments the boundaries are rest ricted to being 
smooth. In such a treatment, if the manifold is three-dimensional, the bomidary 
of a piece of the manifold must be a smooth surface; if it is two-dimensional, 
the boundary must be a smooth curve. 

We will be less restrictive, and will allow our boundaries to have corners. 
There are two reasons for this. First, in many cases, we wish to apply Stokes's 
theorem to things like the region in the sphere where in spherical coordinates. 
0 < $ < tt/ 2, and such a region has corners (at the poles). Second, we would like 
^-parallelograms to be manifolds with bomidary, and they most definitely have 
corners. Fortunately, allowing our boundaries to have corners doesn't make any 
of the proofs more difficult. 

However, we won’t allow the boundaries to be just anything: the boundary 
can’t be fractal, like the Koch snowflake we saw in Section 5.6; neither can it 
contain cusps. (Fractals would really cause problems; cusps would be accept- 
able, but would make our definitions too involved.) Yon should think that a 
region of the boundary either is smooth or contains a corner. Beiug smooth 
means being a manifold: locally the graph of a function of some variables in 
terms of others. What do we mean by comer? Roughly (we will be painfully 
rigorous below) if you should think of the kind of curvilinear “angles” von can 
get if you drew the (x, j/)-plane on a piece of rubber and stretched it, or if you 
squashed a cube made of foam rubber. 

Definition 6.6.1 is illustrated by Figure 6.6.1. 

Definition 6.6.1 (Piece- with-boundary of a manifold). Let M C IP." 
be a fc-dimensional manifold. A subset X C M will be called a piece-with- 
boundary if for every x6^, there exist 

(1) Open subsets U\ C E\ and U cW 1 with x 6 U and f : f/ l -> E 2 a 
C l mapping such that M n U is the graph of f. (This is Definition 
3.2.2 of a manifold.) 

f G '\ 

(2) A diffeombrphism G = I : I ; U\ — *■ 

\6J 

such that X n U is f(X\), where X\ C U\ is the subset where G\ > 

0 , . . . , Gfg ^ 0 . 


Example 6.6.2 (A ^-parallelogram seen as a piece- with-boundary of a 
manifold). A Ar-parallelogram f£(v v*) in R n is a piece-with-bomidary 



Definition 6.6.3 distinguishes 
between the smooth boundary and 
the rest (with corners). 


The Gi should be thought of 
as coordinate functions. Think of 
(x, y, z)-space, i.e., k = 3. The 
(x, z)-plane is the set where y van- 
ishes, the (y, z)-plane is the set 
where x vanishes, and the (x, y)- 
plane is the set where z vanishes. 
This corresponds to the m = 2 di- 
mensional stratum, where k — m 
(i.e., 3 — 2 = 1) of the G x van- 
ish. Similarly, the x axis is the set 
where the y and z vanish; this cor- 
responds to part of the m — 1 di- 
mensioned stratum, where k — m 
(i.e., 3 - 1 = 2) of the Gi vanish. 

We will be interested only in 
the inside (the ^-dimensional stra- 
tum) and the smooth boundary 
(the (k — l)-dimensional stratum), 
since Stokes’s theorem relates the 
integrals of fc-forms and k - 1 
forms. 
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of an oriented Jfc-dimensional submanifold of IR n when the vectors vj , . . . , v* are 
linearly independent. Indeed, if M C 1" is the set parametrized by 

( • \ h->x + £iViH MfcVjt, 6.6.1 

tj 

then M is a A:-dimensional manifold in R n . It is the translation by x of the 
subspace spanned by vj , . . . , v* (it is not itself a subspace because it doesn’t 
contain the origin). For every a € M, the tangent space T a M is the space 
spanned by vi, . . . , v*. The manifold M is oriented by the choice of a nonzero 
element u; € A k {T a M ), and u> gives the standard orientation if 

(j(vi,...,Vfc) > 0. 6.6.2 

The A>paraUelogram P£(v i, . . . , V*) is a piece- with-boundary of M, and thus it 
carries the orientation of M. A 

Definition 6.6.3 (Boundary of a piece- with-boundary of a manifold). 
If X is a piece- with-boundary of a manifold M , its boundary dX is the set 
of points where at least one of the Gi = 0; the smooth boundary is the set 
where exactly one of the Gi vanishes. 

Remark. We can think of a piece-with-boundary of a fc-dimensional mani- 
fold as composed of strata of various dimensions: the interior of the piece and 
the various strata of the boundary, just as a cube is stratified into its interior 
and its two-dimensional faces, one-dimensional edges, and 0-dimensional ver- 
tices. When integrating a A:- form over a piece-with-boundary of a A:-dimensional 
manifold, we can disregard the boundary; similarly, when integrating a (k - 1)- 
form over the boundary, we can ignore strata of dimension less than k — l. More 
precisely, the m-dimensional stratum of the boundary is the set where exactly 
k — m of the Gi of Definitions 6.6.1 and 6.6.3 vanish, so the inside of the piece 
is the A:-dimensional stratum, the smooth boundary is the (k - l)-dimensional 
stratum, etc. The m-dimensional stratum is an m-dimensional manifold in M n , 
hence has m' -dimensional volume 0 for any m' > m (see Exercise 5.2.5); it can 
be ignored when integrating m'-forms. A 

Boundary orientation: the ad hoc world 

The faces of a cube are oriented by the outward-pointing normal, but the other 
strata of the boundary carry no distinguished orientation at all: there is no 
particularly natural way to draw an arrow on the edges. More generally, we 
will only be able to orient the smooth boundary of a piece-with-boundary. 

The oriented boundary of a piece-with-boundary of an oriented curve is sim- 
ply its endpoint minus its beginning point: 
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Figure 6.6.2. 


The boundary of the shaded re- 
gion of IR 2 consists of the three 
curves drawn, with the indicated 
orientations. If you walk along 
those curves, the region will al- 
ways be to your left. 


In Definition 6.6.6, the question 
whether N should be put first or 
last, or whether one should use 
vectors that point inward or out- 
ward, is entirely a matter of con- 
vention. The order we use is stan- 
dard, but not universal. 


Definition 6.6.4 (Oriented boundary of a piece- with-boundary of 
an oriented curve). Let C be a curve oriented by the unit tangent vector 
field T, and let P C C be a piece- with-boundary of C. Then the oriented 
boundary of P consists of the two endpoints of P, taken with sign +1 if the 
tangent vector points out of P at that point, and with sign — 1 if it points in. 

If the piece- with-boundary consists of several such P, , its oriented boundary 
is the sum of all the endpoints, each taken with the appropriate sign. 

Definition 6.6.5 (Oriented boundary of a piece-with-boundary of 

R 2 ). If U C R 2 is a two-dimensional piece-with-boundary, then its boundary 
is a union of smooth curves C*. We orient all the Ci so that if you walk along 
them in that direction, U will be to your left, as shown in Figure 6.6.2. 

When R 2 is given its standard orientation by +det, Definition 6.6.5 says 
that when you walk on the curves, your head is pointing in the direction of the 
2 -axis. With this definition, the boundary of the unit disk { x 2 +y 2 < 1} is the 
unit circle oriented counterclockwise. 

For a surface in R 3 oriented by a unit normal, the normal vector field tells 
you on which side of the surface to walk. Let S C R 3 be a surface oriented by 
a normal vector field N, and let U be a piece-with-boundary of S , bounded by 
some union of curves C t . An obvious example is the upper hemisphere bounded 
by the equator. If you walk along the boundary so that your head points in 
the direction of N, and U is to your left, you are walking in the direction 
of the boundary orientation. Translating this into mathematically meaningful 
language gives the following, illustrated by Figure 6.6.3. 

Definition 6.6.6 (Oriented boundary of a piece-with-boundary of an 
oriented surface). Let S C R 3 be a surface oriented by a normal vector 
field and let Si be a piece-with-boundary of S t bounded by some union 
of closed curves Q. At a point x € C iy let ?out be a vector tangent to S 
and pointing out of S\. Then the boundary orientation is defined by the unit 
vector ^ tangent to Ci, choeen so that 

det [iV(x), v ou t, ? j > 0. 6.6.3 


Since the system composed of your head, your right arm, and your left arm 
also satisfies the right-hand rule, this means that to walk in the direction of 
dS\, you should walk with your head in the direction of N, and the surface to 
your left. 

Finally let’s consider the three-dimensional case: 
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Figure 6.6.3. 

The shaded area is the piece- 
with-boundary Si of the surface 
S. The vector V ou t is tangent to 
S at a point in the boundary C t of 
S, and points out of Si . The unit 
vector v is tangent to C,. Since 

det [tf(x),V outl v] >0 t 

the boundary orientation is de- 
fined by V. If we rotated V by 
180°, then the vectors would obey 
the left-hand rule instead of the 
right-hand rule, and the orienta- 
tion would be reversed. 

For a 2-manifold, i.e. t a surface, 
the u outward-pointing vector tan- 
gent to AT is illustrated by Figure 
6.6.3. 

In Definition 6.6.9, w is a fc- 
form and u)o is a (k — l)-form, so 
the two can’t be equal; Equation 
6.6.5 says not that the forms are 
equal, but that evaluated on the 
appropriate vectors, they return 
the same number. 


Definition 6.6.7 (Oriented boundary of a piece- with-boundary of 
JR 3 ). Let U C M 3 be piece-with-boundary of K 3 , whose smooth boundary 
is a union of surfaces Si. We will suppose that U is given the standard 
orientation of K 3 . Then the orientation of the boundary of U (i.e., the 
orientation of the surfaces) is specified by the outward-pointing normal. 


Boundary orientation: the unified approach 

Now we will see that our ad hoc definitions of oriented boundaries of curves, 
surfaces, and open subsets of R 3 are all special cases of a general definition. 
We need first to define outward-pointing vectors. 

Let M c R" be a manifold, X C M a piece-with-boundary, and x 6 dX 
a point of the smooth boundary of X. At x, the tangent space T x (dX) is a 
subspace of T x X whose dimension is one less than the dimension of T x X s and 
which subdivides the tangent space into the outward-pointing vectors and the 
inward-pointing vectors. 


Definition 6.6.8 (Outward-pointing and inward-pointing vectors). 
Let V € T x (dX\) and write 


v 



with Vi 6 E \,?2 € E 2 . 


6.6.4 


Then v is 


outward pointing if [Dp(xj)]vi > 0, and 
inward pointing if [Dp(xi )]Vi < 0. 


If v is outward pointing, we denote it v out . 

Definition 6.6.9 (Oriented boundary of piece-with-boundary of an 
oriented manifold). Let Af be a /c- dimensional manifold oriented by us, 
and P be a piece-with-boundary of Af . Let x be in 8P> and v ou t € T X M be 
an outward-pointing vector tangent to Af. Then, at x, the boundary dP of 
P is oriented by uq % where 

orienting boundary orienting manifold 

/ — — N 

Vd( Vi , . . . Vk- 1 ) = w(v ou t, Vi , . . . Vjfc_i ) , 6.6.5 


Example 6.6.10 (Oriented boundary of a piece-with-boundary of an 
oriented curve). If C is a curve oriented by u>, and P is a piece-with-boundary 
of C, then at an endpoint x of P (i.e., a point in dP), with an outward-pointing 
vector Vout anchored at x, the boundary point x is oriented by the nonzero 
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If we think of IR 2 as the hori- 
zontal plane in IR 3 , then a piece- 
with-boundary of iR 2 is a special 
case of a piece-with-boundary of 
an oriented surface. Our defini- 
tions in the two cases coincide; in 
the ad hoc language, this means 
that the orientation of IR 2 by det 
is the orientation defined by the 
norma) pointing upward. 


Equation 6.6.9 is justified in the 
subsection, “Equivalence of the ad 
hoc and the unified approaches for 
subspaces of !R 3 .” 


number u>d = u>(v out ). Thus it has the sign +1 if is positive, and the sign 
-1 if (j)q is negative. (Ill this case, u) takes only one vector.) 

This is consistent with the ad hoc definition (Definition 6.6.4). If u;(v) ~ t-v, 
then the condition ujq > 0 means exactly that t(x) points out of P. A 

Example 6.6.11 (Oriented boundary of a piece-with-boundary of IR 2 ). 
Let the smooth curve C be the smooth boundary of a piece-with-boundary S of 
IR 2 . If IR 2 is oriented in the standard way (i.e., by det), then at a point x G C, 
the boundary C is oriented by 

cja(v) = det(v ou t>v). 6.6.6 

Suppose we have drawn the standard basis vectors in the plane in the standard 
way, with e? 2 counterclockwise from e\. Then 

det(v out ,v) > 0 6.6.7 

if, when you look in the direction of v, the vector v ou t is on your right. In this 
case S is on your left, as was already shown in Figure 6.6.2. A 

Example 6.6.12 (Oriented boundary of a piece-with-boundary of an 
oriented surface in ]R 3 ). Let Si C S be a piece-with-boundary of an oriented 
surface 5. Suppose that at x G dSi, S is oriented by a > G A 2 (T X (S)), and that 
v ou t € T X S is tangent to 5 at x but points out of Si. Then the curve dSi is 
oriented by 

u>d(v) - cj(v out ,v). 6.6.8 

This is consistent with the ad hoc definition, illustrated by Figure 6.6.3. In 
the ad hoc definition, where 5 is oriented by a normal vector field N, the 
corresponding u> is 

w(vi,V 2 ) = det (7? (x), Vi , v 2 )) , 6.6.9 

so that 

wa(v) = det (7V(x), Vout? v)). 6.6.10 

Thus if the vectors ei,€? 2 ,e 3 are drawn in the standard way, satisfying the 
right-hand rule, then v defines the orientation of dSi if 7V(x), v out , satisfy the 
right-hand rule also. A 

Example 6.6.13 (Oriented boundary of a piece-with-boundary of IR 3 ). 
Suppose U is a piece-with-boundary of R 3 with boundary dU = 5, and U is 
oriented in the standard way, by det. Then S is oriented by 

^a(vi,v 2 ) = det(v out , v x , v 2 ). 


6 . 6.11 
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Figure 6.6.4. 

The cube spanned by the vec- 

tors vi,V 2 , and 

V 3 , anchored at 

x. To lighten 

notation we set 

a = x + vi,b 

= x + vj, and 

c = x + v 3 . The original three 

vectors are drawn in dark lines; 

the translates in 

lighter or dotted 

lines. The cube’s boundary is its 

six faces: 


/?(*!, Vj) 

the bottom 

f^(V 2 ,Vl) 

the left side 

Px(V!,V 3 ) 

the front 

P«(V 2 ,V 3 ) 

the right side 

Pb(vi,v 3 ) 

the back 

Pc(Vl,V 2 ) 

the top 


If i is odd, the expression 



is preceded by a plus sign; if i is 
even, it is preceded by a minus 
sign. 

Did you expect the right-hand 
side of Equation 6.6.14 to be 

P X %*(V) - Px(v)? 

Remember that v is precisely the 
v* that is being omitted. 
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If we wish to think of orientating S in the ad hoc language, i.e., by a field 
of normals N, this means exactly that for any x 6 S and any two vectors 
Vj , v 2 e r x S, the two numbers 

dct(/V(x), vj, V 2 ) and det(v out , vi, V 2 ) 6.6.12 

should have the same sign, i.e., N(x) should point out of U. A 

The oriented boundary of an oriented ^-parallelogram 

We saw above that an oriented ^-parallelogram /£(vi, • • . , v*) is a piece- with- 
boundary of an oriented manifold if the vectors vj, . . . , v* are linearly indepen- 
dent (i.e., the parallelogram is not squished flat). As such its boundary carries 
an orientation. 

Proposition 6.6.14 (Oriented boundary of an oriented A>parallelo- 
gram). 

The oriented boundary of an oriented ^-parallelogram P£ (^ 1 , . . . ,?k) »s 
given by 

* . , ^ ^ \ 6.6.13 

^ ] (•—!)* 1 (/?+*(*!,. • • »^*> • • • > v *) “ • • • » v *» • ♦ ♦ j 

* 5=1 

where a hat over a term indicates that it is being omitted. 

This business of hats indicating an omitted term may seem complicated. 
Recall that the boundary of an object always has one dimension less than the 
object itself: the boundary of a disk is a curve, the boundary of a box con- 
sists of the six rectangles making up its sides, and so on. The boundary of 
a h-dimensional parallelogram is made up of (k - l)-parallelograms, so omit- 
ting a vector gives the right number of vectors. For the faces of the form 
p x (vi, . . . , Vi , . . . , v*), each of the k vectors has a turn at being omitted. (In 
Figure 6.6.4, these faces are the three faces that include the point x.) For 
the faces of the type P x+ *.(v 1 , . . . , v*, . . . , v*), the omitted vector is the vector 
added to the point x. 

Before the proof, let us give some examples, which should make the formula 
easier to read. 

Example 6.6.15 (The boundary of an oriented 1-parallelogram). The 

boundary of P x (v) is 

<9/?(v) = - /$. 6.6.14 

So the boundary of an oriented line segment is its end minus its beginning, as 
you probably expect. 
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Example 6.6.16 (The boundary of an oriented 2-parallelogram). A 
look at Figure 6.6.5 will probably lead you to guess that the boundary of an 
oriented parallelogram is 



Figure 6.6.5. 

If you start at x in the direction 
of vi and keep going around the 
boundary of the parallelogram, 
you will find the sum in Equation 
6.6.15. The last two edges of that 
sum are negative because you are 
traveling against the direction of 
the vectors in question. 


dPZ(v , , V 2 ) = P°(v, ) + P° +v -, (v 2 ) - (vi ) - PZ&h 

4tb side 


6.6.15 


boundary 1st side 2nd side 

which agrees with Proposition 6.6.14. A 


■v* 

3rd side 


Example 6.6.17 (Boundary of a cube). For the faces of a cube shown in 
Figure 6.6.4 we have: 


(* = 1 

so 

(-!)*"' = 

1 ); 

II 

to 

so 

(-ir l = 

-i); 

(t = 3 

so 

(-I)*"' = 

i); 


+ ( P x+g,( y 2.V3) - 


rigbt side 


left side 


~ (^+? 3 (*!.*») ~ Pj&l ■ V») ) 


back 


front 


-V 

top 


bottom 


6.6.16 


+ (fg + 9,(vi,va) - Pi (vi , v 2 )) . A 


How many “faces” make up the boundary of a 4-parallelogram? What is each 
face? How would you describe the boundary following the format used for the 
cube in Figure 6.6.4? Check your answer below. 7 


The important consequence of 
preceding each term by (— l)*" 1 is 
that the boundary of the bound- 
ary is 0. The boundary of the 
boundary of a cube consists of the 
edges of each face, each edge ap- 
pearing twice, once positively and 
once negatively, so that the two 
cancel. 


Proof of Proposition 6.6.14. As in Example 6.6.2, denote by M the 
manifold of which P®(vi, . . . , v*) is a piece-with-boundary. The boundary 
dP£(v |, . . . , Vfc) is composed of its 2k faces (four for a parallelogram, six for a 
cube ... ), each of the form 



or 


j?(*i 



6.6.17 


where a hat over a term indicates that it is being omitted. The problem is to 
show that the orientation of this boundary is consistent with Definition 6.6.9 
of the oriented boundary of a piece-with-boundary. 


7 A 4-parallelogram has eight “faces,” each of which is a 3-parallelogram (i.e., a par- 
allelepiped, for example a cube). A 4-parallelogram spanned by the vectors vi , v 2 , V3, 
and V4, anchored at x, is denoted , v 2 , V3, v 4 ). The eight “faces” of its boundary 
are 

PSlVl, Px(y l,v 2 ,v„), 

P x+9, (V 2 , V 3 , V„), , V 3 , V.), 

PZ + e 3 (VI , H 2 , Vi), PZ + C< (Vi , v 2 , v 3 ). 
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A 


+ V 2 


X +V, A 


V, 


X 


Let a; £ A k (M) define the orientation of so that u(vi, . . . , v*) >0. At a 
point of (vj , v f , . . . , v*), the vector v, is outward pointing, whereas at 

a point of P£(v! v, ,v*). the vector -v* is outward pointing, as shown 

in Figure 6.6.6. Thus the standard orientation of P° + ^ t (vj , . . . , v*, . . . , v>) is 
consistent with the boundary orientation of P£(V] » • • . , v,-, . . . , v*) precisely if 

w(vi,v, v, v*) > 0, 

i.e., precisely if the permutation <7* on k symbols which consists of taking the 
ith element and putting it in first position is a positive permutation. But the 
signature of a * is ( — 1)* “ 1 , because you can obtain by switching the ith 
symbol first with the (i - l)th, then the (i — 2)th, etc., and finally the first, 
doing i - 1 transpositions. This explains why P* + ^ (vi , . . . , v*, . . . , v*) occurs 
with sign (-1)*” 1 . 

vs. 

A similar argument holds for P£(v i, . . . , v*, . . . , v*). This oriented parallel- 
ogram has orientation compatible with the boundary orientation precisely if 
Vi, . . . , Vj — , ^k\> 0, which occurs if the permutation <7* is odd. This 
explains why P" (vj v,, . . . , v*) occurs in the sum with sign (—1)*. □ 
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Figure 6.6.6. 

The vector vt anchored at x 
is identical to Px(vi,v 2 ). On 
this segment of the boundary of 
the parallelogram P*(vj , v 2 ). the 
outward-pointing vector is ~V 2 - 
The top edge of the parallelogram 
is Px+* 2 (vi, V’j); on this edge, the 
outward-pointing vector is +V 2 . 
(We have shortened the outward 
and inward pointing vectors for 
the purpose of the drawing.) 


In which we diffei’entiate forms. 

Now we come to the construction that gives the theory of forms its power, 
making possible a fundamental theorem of calculus in higher dimensions. We 
have already discussed integrals for forms. A derivative for forms also exists. 
This derivative, often called the exterior derivative , generalizes the derivative 
of ordinary functions. We will first discuss the exterior derivative in general; 
later we will see that the three differential operators of vector calculus (div, 
curl, and grad) are embodiments of the exterior derivative. 


Reinterpreting the derivative 


Equations 6.7.1 and 6.7.2 say 
the same thing in different words. 
In the first we are evaluating / at 
the two points x + h and x. In 
the second we are integrating / 
over the boundary of the segment 
Pr (h). 


What is the ordinary derivative? Of course, you know that 

/'(*) = ton i(/(z + A) -/(*)), 6.7.1 

but we will reinterpret this formula as 



lim 

h .0 


1 

h 


/ '• 
JdPS(h) 


6.7.2 


What does this mean? We are just using different words and different nota- 
tion to describe the same operation. Instead of saying that we are evaluating 
/ at the two points x -f h and x, we say that we are integrating the 0-form / 
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over the boundary of the oriented segment [x, x 4- /ij = P°{h). This boundary 
consists of the two oriented points 4 -P£+h and -P". The first point is the 
endpoint of and the second its beginning point; the beginning point is 

taken with a minus sign, to indicate the orientation of the segment. Integrating 
the 0-form / over these two oriented points means evaluating / on those points 
(Definition 6.5.22). So Equations 6.7.1 and 6.7.2 say exactly the same thing. 

It may seem absurd to take Equation 6.7.1, which everyone understands 
perfectly well, and turn it into Equation 6.7.2, which is apparently just a more 
complicated way of saying exactly the same thing. But the language generalizes 
nicely to forms. 


Defining the exterior derivative 

I he exterior derivative d is an operator that takes a A>form p and gives a 
(k 4- l)-form, dp. Since a (A: 4- l)-form takes an oriented ( k 4- l)-dimensional 
parallelogram and gives a number, to define the exterior derivative of a Ar-form 
<p, wc must say what number it gives when evaluated on an oriented (A: 4- 1)- 
parallelograrn. 


Compare the definition of the 
exterior derivative and Equation 
6.7.2 for the ordinary derivative: 



lim — 
a— o h 


I. 


r)Pl'(h) 


/■ 


One thing that makes Equation 
6.7.3 hard to read is that the ex- 
pression for the boundary is so 
long that one might almost miss 
the p at the end. We are integrat- 
ing the A:- form p over the bound- 
ary, just as in Equation 6.7.2 we 
are integrating / over the bound- 
ary. 


Definition 6.7.1 (Exterior derivative). The exterior derivative d of a 
A;-form p , denoted dp , takes a A: + 1-parallelogram and returns a number, as 
follows: 


(fc+ 1 )-paralleJograzn 


integrating <p over boundary 




<*+i)- 

form 


dPZ(hv\ ,...,/iv* +1 ) 


p. 6.7.3 




boundary of A:+l -parallelogram, 
smaller and smaller as h-*0 


This isn’t a formula that you just look at and say— “got it.” We will work 
quite hard to see what the exterior derivative gives in particular cases, and to see 
how to compute it. That the limit exists at all isn’t obvious. Nor is it obvious 
that the exterior derivative is a (A: 4- l)-form: we can see that dp is a function 
of A: 4- 1 vectors, but it’s not obvious that it is multilinear and alternating. Two 
of Maxwell’s equations say that a certain 2-form on l 4 has exterior derivative 
zero; a course in electromagnetism might well spend six months trying to really 
understand jvhat this means. But observe that the definition makes sense; 

(^ 1 , . . . , v* +1 ) is (A: 4- 1) -dimensional, its boundary is A:-dimensional, so it is 
something over which we can integrate the A:- form p. 

Notice also that when A: = 0, this boils down to Equation 6.7.1, as restated 
in Equation 6.7.2. 


Remark 6.7.2. Here we see why we had to define the boundary of a piece-with- 
boundary as we did in Definition 6.6.9. The faces of the ( k 4- l)-parallelogram 
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We said earlier that to gener- 
alize the fundamental theorem of 
calculus to higher dimensions we 
needed a theory of integration over 
oriented domains. This is why. 


P°(y |, . . . , vjt+i) are fc-dimensional. Multiplying the edges of these faces by h 
should multiply the integral over each face by h k . So it may seem that the limit 
above should not exist, because the individual terms behave like h k /h k+1 = 
1/h. But the limit does exist, because the faces come in pairs with opposite 
orientation, according to Equation 6.6.13, and the terms in h k from each pair 
cancel, leaving something of order h k+l . 

This cancellation is absolutely essential for a derivative to exist ; that is why 
we have put so much emphasis on orientation. A 


Computing the exterior derivative 

Theorem 6.7.3 shows how to take the exterior derivative of any fc-form. This 
is a big theorem, one of the major results of the subject. 

Theorem 6.7.3 (Computing the exterior derivative of a fc-form). 

(a) If the coefficients a of the k-form 

A • • • A dx ik 6.7.4 

are C 2 functions on U C R n , then the hmit in Equation 6 .7.3 exists, and 
defines a (fc + 1 )-form. 

(b) The exterior derivative is tineas over R; if p and ip are k-forms on 
U C R n , and a and b are numbers (not functions), then 

d(a(p + bip) = achp + bdip. 6.7.5 

(c) The exterior derivative of a constant form is 0. 

(d) The exterior derivative of the 0 -form (i.e., function) f is given by the 
formula 

df = [D/] = £(A/) dx„ 6.7.6 

*=1 

(e) If f is a function, then 

d(f dx il A • • • A dxi k ) =r A dxj, A--- A dx ik . 6.7.7 


Theorem 6.7.3 is proved in Appendix A. 20. 

These rules allow you to compute the exterior derivative of any fc-form, as 
shown below for any fc-form and as illustrated in the margin: 
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The first line of Equation 6.7.8 
just says that dp = dp\ for exam- 
ple, if p = f(dxAdy)+g(dyAdz ), 
then 

d<p = d( f(dx A dy) + g(dy A dz)^j . 

The second line says that the ex- 
terior derivative of the sum is the 
sum of the exterior derivatives. 
For example: 

d(f{dx Ady) +g(dy Ad*)) = 

d(f{dx A dy)j + d(g{dy A dz)') . 
The third line says that 
d ^ f(dx A dy)) — df Adx Ady 

d(g(dy A dz)) — dg Ady A dz. 


The first term in the second line 
of Equation 6.7.11 is 0 because it 
contains two dx' s; the second be- 
cause it contains two dy’s. Since 
exchanging two terms changes the 
sign of the wedge product, ex- 
changing two identical terms 
changes the sign while leaving it 
unchanged, so the product must 
be 0. With a bit of practice com- 
puting exterior derivatives you will 
learn to ignore wedge products 
that contain two identical terms. 


You will usually want to put 
the dx x in ascending order, which 
may change the sign, as in the 
third line of Equation 6.7.12. The 
sign is not changed in the last line 
of Equation 6.7.11, because two 
exchanges are required. 


writing \p in full 


dip — d A • • ■ A dx ik 

yi A • • • A dxj fc )) 

(b) 1<*K <*fc<n 6.7.8 

v v ■■ — — ' 

exterior derivative of sum equals sum of exterior derivatives; 

^ (dOjy it) A dX{ t A • • • A dXi k 

(e) 1 <*i< •<**<« J ^ 



problem reduced to computing ext. deriv. of function 

Going from the first to the second line reduces the computation to computing 
exterior derivatives of elementary forms; going from the second to the third 
line reduces the computation to computing exterior derivatives of functions. In 
applying (e) we think of the coefficients a*, as the function /. 

We compute the exterior derivative of the function / = a», . from part 
(d): 

n 

^ ^ Djp,i l i k dij. 6.7.9 

j=i 

For example, if / and g are functions in the three variables x, y and z, then 

df = D\f dx + D 2 f dy + D 3 f dz, 6.7.10 

so 

df Adx Ady = (D\ f dx + D 2 f dy + £> 3 / dz) Adx A dy 

= D\f dx Adx A dy +£ 2 / dy Adx Ady +D 3 f dz Adx Ady 

0 0 

= D 3 fdz Adx Ady = D 3 fdx Ady Adz. 6.7.11 


Example 6.7.4 (Computing the exterior derivative of an elementary 
2-form on R 4 ). Computing the exterior derivative of xiX 3 (dx 2 A dx 4 ) gives 

d(x 2 X 3 ) A dx 2 A dx 4 

0 0 

= ( D\ (x 2 x 3 ) dx 1 + D 2 (x 2 x 3 ) dx 2 + D 3 (x 2 x 3 ) dx 3 + D 4 (x 2 x 3 ) dx 4 ) A dx 2 A dx 4 

V — V " ' 

d(Z 2 x 3 ) 

- ( x 3 dx 2 + x 2 dx 3 ) A dx 2 A dx 4 = (x 3 dx 2 A dx 2 A dx 4 ) + (x 2 dx 3 Adx 2 A dx 4 ) 
= x 2 (dx 3 A dx 2 A dx 4 ) = -x 2 (dx 2 A dx 3 A dx 4 ) . A 6.7.12 

v -— V / V— v ✓ 

dx a out of order sign changes as 

order is corrected 
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Figure 6.7.1. 

The origin is your eye; the 
“solid angle” with which you see 
the surface S is the cone shown 
above. The intersection of the 
cone and the sphere of radius 1 
around your eye is the region P. 
The integral of the 2-form 
over S is the same as its integral 
over P y as you are asked to prove 
in Exercise 6.9.8, in the case where 
5 is a parallelogram. 


' / 



Figure 6.7.2. 


The vector field F-z of Example 
6.7.6 points straight out from the 
origin. The flux of this through 
the unit circle is positive. 


What is the exterior derivative of the 2-form on iR 3 x\x%dx\ A d 2 ? Check 
vour answer below. 8 

V 

Example 6.7.5 (Computing the exterior derivative of a 2-form). Com- 
pute the exterior derivative of the 2- form on R 4 , 

= xix 2 dx 2 A dx 4 - x 2 dx3 A dx 4 , 6.7.13 

which is the sum of two elementary 2-forms. We have 

dip — d(x 1X2 dx 2 A dx 4 ) - d(x \ dx 3 A dx 4) 

— (Di(xix 2 ) dx\ + D 2 (xix 2 ) dx?r¥ D 3 (xix 2 ) dx 3 + D 4 (xix 2 )dx 4 ) A dx 2 A dx 4 

- (Di(x\) dx 1 + D 2 (x 2) dx 2 + D 3 (xl) dx 3 + D 4 (x 2 ) dx 4) A dx 3 A dx 4 

= (x2 dx 1 + X\ dx 2 ) A dx2 A dx 4 - (2x 2 dx 2 A dx 3 A dx 4 ) 

= X2 dx 1 A dx 2 A dx 4 + xi dx 2 A dx 2 A dx 4 -2x 2 dx 2 A dx3 A dx 4 6.7.14 



= 0 

= x 2 darj A dx 2 A dx 4 — 2x 2 dx 2 A dx 3 A dx 4 . A 
Example 6.7.6 (Element of angle). The vector fields 



1 

x 2 + y 2 



and 



1 

(x 2 + y 2 + X 2 ) 3/2 



6.7.15 


satisfy the property that dW ^ = 0 and = 0. The forms Wp 2 and can 
be called respectively the “element of polar angle” and the “clement of solid 
angle”; the latter is depicted in Figure 6.7.1. 

We will now find the analogs in any dimension. Using again a hat to denote 
a term that is omitted in the product, our candidate is the (n - l)-form on R n : 


n 


W n = 


^(-lj^xjdx, A — Adxi A-.* Adx n , 6.7.16 


(x\ + • • • + x 2 )"/2 ZJ 
which can also be thought of as the flux of the vector field 



1 

(*!+••• + X 2 ) n / 2 


Xi 


1 


X n J 


which can be written . 

1 * 1 " 


6.7.17 


s 


d(x jx 2 dx\ A dxi) — d{x\x 3 ) A dx 1 A dx 2 

= Di (*,*!) dll A dx, A dx 2 + Difaxl) dx 2 A dx t A dx, + £>,(*, x|) dx 3 A dx, A dx, 
2 xjX 3 dx3 A dx\ A dx 2 = 2x1x3 dx 1 A dx 2 A dx 3. 
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In going from the first to the 
second line we omitted the par- 
tial derivatives with respect to the 
variables that appear among the 
dx ' s to the right; i.e., we compute 
only the partial derivative with re- 
spect to Xi, since dx t is the only dx 
that doesn’t appear on the right. 

Going from the second to the 
third line is just putting the di« in 
its proper position, which also gets 
rid of the ( — 1) , — 1 ; moving dxi into 
its proper position requires i - 1 
transpositions. 

The fourth equal sign is merely 
calculating the partial derivative 
and the fifth involves factoring out 
(if + 1- in) n/2_1 from the nu- 

merator and canceling with the 
same factor in the denominator. 


It is clear from the second description that the integral of the flux of this vector 
field over the unit sphere S n ~ l is positive; at every point, this vector field points 
outwards, as shown for n — 2 in Figure 6.7.2. In fact, the flux is equal to the 
(n - l)-dimensional volume of S n ~ l . 

The computation in Equation 6.7.18 below shows that du) n = 0: 


U>r 


= d 


(if -I 1- 1 


2W 5D- 1 > 1 " ,x ‘ 4|x ‘ a 

i= 1 


A dxi A • • • A dx n 


= £(-i y-'Dr 


dxi A dx\ A • • • A dxi A • • • A di n 


t=i 


(if H h x 2 ) n/2 

= V A f T 2 “ — OW 2 J dxi A • • • A dx n 

fr[ V(i? + -*- + x 2 ) n / 2 / 

^ ( (if + • * • + I 2 ) n/2 - ni?(lf X 2 n) n/2 ~ l 

(if 4 + i 2 ) n 

i A • • • A dx n = 0. 


-±( 
t=i x 

_ ^ ( x t + ... + x l-nx1 \ 

_ =fV(*? + - + *2)" /2+ v 


) 


dx 1 A • • • A dx n 


6.7.18 


We get the last equality because the sum of the numerators cancel. For 
instance, when n = 2 we have if + x\ — 2if + if 4- x\ - 2i 2 = 0. A 


In the double sum in Equation 
6.7.19, the terms corresponding to 
i = j vanish, since they are fol- 
lowed by dxi A dxi- If i / j, the 
pair of terms 

DjDifdXj A dx x 
and DiDjfdx,Adxj 

cancel, since the crossed partials 
are equal, and dx 3 A dx x — -dx t A 
dxj. 

The second equality in the sec- 
ond line of Equation 6.7.19 is part 
(d) of Theorem 6.7.3. Here Dif 
plays the role of / in part (d), giv- 
ing 

n 

d(DJ) = Y,D 1 D,fdx 1 -, 

j=l 

we have j in the subscript rather 
than i , since i is already taken. 


Taking the exterior derivative twice 

The exterior derivative of a fc-forin is a (k + l)-form; the exterior derivative of 
that ( k + l)-form is a (k + 2)-form. One remarkable property of the exterior 
derivative is that if you take it twice, you always get 0. (To be precise, we must 
specify that <p be twice continuously differentiable.) 

Theorem 0.7.7. For any k-form on U C R n of class C 2 , we have d(d<p) = 0. 
Proof. This can just be computed out. Let us see it first for 0-forms: 

n n 

ddf = d(j2 Difdx f ) = J2 d (Difdx,) 

i ~ 1 i=l 

n „ „ 6.7.19 

= YdPJ A dx-i = ^ Y, DjDifdxj A dxi — 0. 

i=i i-i j=i 

If k > 0, it is enough to make the following computation: 
d(d(fdx il A • • • A dx tk )) = d(df A dx Xl A • • • Adx ik ) 

— ( ddf ) A di t , A • • • A dx Xk = 0. □ 6.7.20 

— o 
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There is also a conceptual proof of Theorem 6.7.7. Suppose y? is a A:- form 
and you want to evaluate d (dtp) on A: + 2 vectors. We get d (dp) by integrating 
dp over the boundary of the oriented ( k + 2)-parallelogram spanned by the 
vectors, i.e., by integrating p over the boundary of the boundary. But what 
is the boundary of the boundary? It is empty! One way of saying this is that 
each face of the (A; -I- 2)-parallelogram is a (A: + l)-dimensional parallelogram, 
and each edge of the (k -f 1)- parallelogram is also the edge of another (k + 1)- 
parallelogram, but with opposite orientation, as Figure 6.7.3 suggests for A: = 1, 
and as Exercise 6.7.8 asks you to prove. 

6.8 The Exterior Derivative in the Language of 

Figure 6.7.3. VECTOR CALCULUS 

Each edge of the cube is an 

edge of two faces of the cube, and The operators grad , div, and curl are the workhorses of vector calculus. We 
is taken twice, with opposite ori- will see that they are three different incarnations of the exterior derivative, 
entations. 



The gradient associates a vec- 
tor field to a function. The curl 
associates a vector field to a vec- 
tor field, and the divergence asso- 
ciates a function to a vector field. 


We denote by the symbol V 
(“nabla ”) the operator 



A' 

D% 

d 3 


Some authors call V “del.” 


Definition 6.8.1 (Grad, curl and div). Let / : U — ► U be a C 1 function 
on an open set U C R n , and let F be a C 1 vector field on U. Then the grad 
of a function, the curl of a vector field, and the div of a vector field, are given 
by the formulas below: 


grad/ = 


A/" 

D 2 f 

D z f 




'Fi 

= V x F = 

■IV 


>r 


d 2 f 3 — d 2 f 2 

curl 

f 2 

d 2 

X 

f 2 

= 

D 3 F\ — D\F$ 


f 3 


d 3 


I*® J 


D\F 2 — D 2 F\ 


crons product of V and P 


div F = div 


>r 


A' 


>r 

f 2 

II 

< 

II 

d 2 

• 

F 2 

f 3 


d 3 


f 3 


= D\F\ -f D 2 F 2 + D 2 F 3 . 

s “““ “ 1 1 1 v ' ' S 

dot product of ^ and P 


Note that both the grad of a 

function, and the curl of a vector These operators all look kind of similar, some combination of partial deriva- 
field, are vector fields, while the tives. (Thus they are called differential operators.) We use the symbol V to 
div of a vector field is a function. make it easier to remember the above formulas, which we can summarize: 


Mnemonic: Both “curl” and 
“cross product” start with “c” ; 
both “divergence” and “dot prod- 
uct” start with “d" . 


grad/ = V/ 
curl j ? = ?xF 
div F — V ■ F. 


6.8.1 



6.8 The Exterior Derivative and Vector Calculus 551 


We will use the words “grad,” 
“curl,” and “div” and the cor- 
responding formulas of Equation 
6.8. 1 interchangeably, both in text 
and in equations. This may seem 
confusing at first, but it is impor- 
tant to learn both to see the name 
and think the formula, and to see 
the formula and think the name. 


Example 6.8.2 (Curl and div). Let F be the vector field 


/ x\ 

— z 

II 

xz 2 


_x + y_ 


i.e., Fi = -z , F 2 = xz 2 , F 3 = x + y. 


The partial derivative D 2 F 3 is the derivative with respect to the second variable 
of the function F 3 , i.e., D 2 (x + y) = 1. Continuing in this fashion we get 


curll 

-z 
xz 2 


D 2 (x + y) - Dz(xz 2 ) 
D 3 (-z) - D\(x + y) 

— 

1 - 2xz 
-2 

V 

x + y 

) 

D\{xz 2 ) - D 2 (-z) 


z 2 


The divergence of the vector field F 



is 1 -f x 2 z + y. 


6.8.2 


A 


To compute the exterior deriv- 
ative of a function / one can com- 
pute grad /. To compute the ex- 
terior derivative of the work form 
field of P one can compute curl P. 
To compute the exterior derivative 
of the flux form field of F one can 
compute div F. 


What is the grad of the function / = x 2 y + z? What are the curl and div of 


the vector field F = 


-y 

x 
xz 

The following theorem 
density form fields. 


? Check your answers below. 9 
relates the exterior derivative to the work, flux and 


Theorem 6.8.3 (Exterior derivative of form fields on JR 3 ). Let f be a 
function on R 3 and let P be a vector field. Then we have the following three 
formulas: 

(a) df = i.e., df is the work form held of grad /, 

(b) dWf = he., dWp is the flux form Geld of curl F , 

(c) he., d$p is the density form Geld of div P. 


Example 6.8.4 (Equivalence of df and VVgrad /). In the language of forms, 
to compute the exterior derivative of a function in R 3 , we can use part (d) of 
Theorem 6.7.3 to compute d of the O-form /: 

df = Difdxi -l- D 2 fdx 2 -I- Dzfdxz. 6.8.3 


9 



’2 xy' 


ZV 


-y 


0‘ 


D l - 


’ -y ' 

grad / = 

X 2 

; curl F = 

d 2 

X 

X 

= 

— z 

; div F = 

d 2 

• 

X 


1 

• m 


d 3 


xz 


2' 


d 3 


xz . 



Note that writing 

df = Difdxi 4- Difdx? 4- D 3 fdx 3 

is exactly the same as writing 

<ff ~[Df) = \DJ,D 2 f,D 3 f). 

Both are linear transformations 
from R 3 -*• R, and evaluated on 
a vector v, they give the same re- 
sult: 

(Difdxi + D 2 fdx 2 +D 3 fdx 3 )(v) 
= D\fv\ 4- D 2 fv 2 4- D 3 fv 3t 
and 

V\ 

[D l f,D 2 f>D 3 f\ t* 

.”3. 

= Difvi 4- D%fv7 4- D 3 fv 3 . 


Here we write $><* to avoid con- 
fusion with our vector field 

F = 2 . 

.v z . 


Exercise 6.8.3 asks you to work 
out a similar example showing the 
equivalence of d$p and p^.p- 
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Evaluated on the vector v — 


V\ 

1'2 

V-A 


, this 1-form gives 


d/(v) = D[fv\ 4- D 2 fv 2 4- D,fv,. 

In the language of vector calculus, we t an compute W gr af} / - ~ ^ 


6.8.4 


Dxf 

D 2 f 

{Mi 


which evaluated on v gives 

= 




V\ ' 

D 2 f 

. 

v 2 

D 3 f 




— D\fv\ 4" ^ 2/^2 4* D\iV$. A 6.8.5 


Example 6.8.5 (Equivalence of dWp and Let 11 s compute the 

exterior derivative of the 1-form in IR 3 


xydx + zdy + yzdz, i.e., Wp , when F = 


xy 

z 

yz 


In the language of forms, 

d(xy dx + zdy + yz dz) — d(xy) a dx 4- d{z) A dy + d(yz) A dz 
= ( D\xydx + Dzxydy 4- D^xydz) A dx 4- {D\zdx 4- D 2 zdy 4- D$zdz) A dy 
4- (Diyzdx 4- D 2 yzdy 4- Dzyzdz ) A dz 
= -x{dx A dy) 4- (z — l)(dy A dz). 6.8.6 


Since any 2-form in 1R 3 can be written = Gi dyAdz — G 2 dxAdz+G$ dxAdy , 


the last line of Equation 6.8.6 can be written for 



z - 1 
0 

-x 


This vector field is precisely the curl of F : 


V x F = 

"/>r 


xy 


D 2 yz - D 3 Z 


z - 1 


F>2 

X 

z 


-D]yz + Dzxy 

= 

0 

. A 


f 3 


VK 


D\z - D 2 Xy 


~x 



Proof of Theorem 6.8.3. The proof simply consists of using symbolic entries 
rather than the specific ones of Examples 6.8.4 and 6.8.5 and Exercise 6.8.3. 
For part (a), we find 


df = D ] fdx 4 D 2 fdy 4- Dzfdz = Wr Dif = 

D 2 f 
. D 3 f. 


6.8.7 
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The ivork form field Wp of a 
vector field F is the 1-form field 

W->(Px°(v)) = F(x)v. 

So Theorem 6.8.3 part (a) says 

df(P"(v)) = H' C/ (p;(v)) 

= V/(x) • v. 

The fixur form field 4>p of a 
— « ' 

vector field F is the 2- form field 
*/j(^(V.,V 2 )) 

= dct[F(x), Vj , v-i). 

So part (b) says 

dlV ? (p x "(v,.v 2 )) 

= <lctl(f xF)(x),v,,v.,). 


For part (b), a similar computation gives 

dWp = d(Fidx + F 2 dy 4- F 3 dz) = dt\ A dx + dF 2 A dy + dF 3 A dz 
= (. D\F\dx 4- D 2 F ] dy 4- D 3 F } dz) A dx 
4- ( D\F 2 dx 4~ D 2 F 2 dy + D 3 F 2 dz) A dy 
-f- ( D\F 3 dx 4- D 2 F 3 dy 4- D 3 F 3 dz) A dz 
= (D\F 2 - D 2 F\)dx A dy + (D X F 3 - A dz 

+ (D 2 F 3 — D 3 F 2 )dy A dz 

= ^ > rD 2 /‘3-D 3 F2] = ®VxF- 
F\ — D i F3 
. D1F2 — D2F1 M 

For part (c), the computation gives 

d^p — d(F\dy Adz + F 2 dz A dx + F 3 dx A dy) 

= (D\F\dx + D 2 F\dy 4* D 3 F\dz ) Ady Adz 
+ ( D\F 2 dx 4- D 2 F 2 dy 4* D 3 F 2 dz) Adz A dx 
4 - ( D\F 3 dx 4 - D 2 F 3 dy 4- D 3 F 3 dz) A dx Ady 
- {D\ F\ 4- D 2 F 2 4- D 3 F 3 )dx Ady Adz = p^. □ 


6.8.8 


6.8.9 


The density form field p/ of a 
function / is the .'3-form field 

P/(^x(v i, v 2 ,v 3 )) 

- /(x) detjvi . V2. V3). 


Theorem 6.8.3 says that the three incarnations of the exterior derivative in 
R 3 are precisely grad, curl, and div. Grad goes from 0-form fields to 1-form 
fields, curl goes from 1-form fields to 2-form fields, and div goes from 2-form 
fields to 3-form fields. This is summarized by the diagram in Figure 6.8.1, which 
you should learn. 


So part (c) of Theorem 6.8.3 
says 

dQpfav i,v 2 ,v 3 )) 

= (V • F)(x) det.(v, , v 2 , v :) ]. 

The diagram of Figure 6.8.1 
commutes; if you start anywhere 
on the left, and go down and right, 
you will get the same answer as 
you get going first right and then 
down. 

FIGURE 6.8.1. In R 3 , 0-form fields and 3-form fields can be identified with functions, 
and 1-form fields and 2- form fields can be identified with vector fields. The operators 
grad, curl, and div are three incarnations of the exterior derivative d, which takes a 
fc-form field and gives a (k 4 - l)-form field. 
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Geometric interpretation of the exterior derivative in R* 

We already knew how to compute the exterior derivative of any k- form, and 
we had an interpretation of the exterior derivative of a k - form ip as integrating 
<p over the oriented boundary of a (k + l)-parallelogram. Why did we bring in 

grad, curl and div? 

One reason is that being familiar with grad, curl, and div is essential m 
many physics and engineering courses. Another is that they give a different 
perspective on the exterior derivative in R 3 , with which many people are more 

comfortable. 

Geometric interpretation of the gradient 

The gradient of a function, abbreviated grad, looks a lot like the Jacobian 
matrix. Clearly grad /(x) = [D/(x)] T ; the gradient is gotten simply by putting 
the entries of the line matrix [D/(x)] in a column instead of a row. In particular, 

grad/(x) • v = [D/(x)]V; 6.8.10 

the dot product of v with the gradient is the directional derivative in the di- 
rection v. 

If 9 is the angle between grad/(x) and v, we can write 

grad/(x) • v = |grad/(x)| |v|cos0, 6.8.11 

which becomes |grad /(x)|cos0 if v is constrained to have length 1. This is 
maximal when 9 — 0, giving grad/(x) • v = |grad/(x)|. So we see that 

The gradient of a function f at x points in the direction in which f 
increases the fastest, and has a length equal to its rate of increase in that 
direction. 

Remark. Some people find it easier to think of the gradient, which is a vector, 
and thus an element of R n , than to think of the derivative, which is a line 
matrix, and thus a linear function R n — ► R. They also find it easier to think 
that the gradient is orthogonal to the curve (or surface, or higher-dimensional 
manifold) of equation /(x) - c = 0 than to think that ker[D/(x)] is the tangent 
space to the curve (or surface or manifold). 

Since the derivative is the transpose of the gradient, and vice versa, it may 
not seem to make any difference which perspective one chooses. But the deriv- 
ative has an advantage that the gradient lacks: as Equation 6.8.10 makes clear, 
the derivative needs no extra geometric structure on R n , whereas the gradient 
requires the dot product. Sometimes (in fact usually) there is no natural dot 
product available. Thus the derivative of a function is the natural thing to 
consider. 

But there is a place where gradients of functions really matter: in physics, 
gradients of potential energy functions are force fields, and we really want to 
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By conservative we mean that 
the integral on a closed path is 
zero: i.e., t.he total energy ex- 
pended is zero. The gravity force 
field is conservative, but any force 
field involving friction is not: the 
potential energy you lose going 
down a hill on a bicycle is never 
quite recouped when you roll up 
the other side. 

In French the curl is known as 
the rotationnel, and originally in 
English it was called the rotation 
of the vector field. It was to avoid 
the abbreviation rot that the word 
curl was substituted. 



Figure 6.8.2, 

Curl prohe: put the paddle 
wheels at some spot of the fluid; 
the speed at which it rotates will 
be proportional to the component 
of the curl in the direction of the 
axle. 

The flow is out if the hox has 
the standard orientation. If not, 
the flow is in. 


think of force fields as vectors. For example, the gravitational force field is the 
0 

■ 

, which we saw in Equation 6.4.2: this is the gradient of the 


vector 


0 

-grn 


height function (or rather, minus the gradient of the height function). 

As it turns out, force fields are conservative exactly when they are gradi- 
ents of functions, called potentials (discussed in Section 6.11). However, the 
potential is not observable, and discovering whether it exists from examining 
the force field is a big chapter in mathematical physics. A 


Geometric interpretation of the curl 


The peculiar mixture of partials that go into the curl seems impenetrable. We 
aim to justify the following description. 

The curl probe. Consider an axis, free to rotate in a bearing that you hold, 
and having paddles attached, as in Figure 6.8.2. 

We will assume that the bearing is packed with a viscous fluid, so that its 
angular speed (not acceleration) is proportional to the torque exerted by the 
paddles. If a fluid is in constant motion with velocity vector field F, then the 
curl of the velocity vector field at x, (V x F)(x), is measured as follows: 

The curl of a vector field at a point x points in the direction such that 
if you insert the paddle of the curl probe with its uxis in that direction , 
it will spin the fastest. The speed at which it. spins is pivportional to the 
magnitude of the curl. 


Why should this be the case? Using Theorem 6.8.3(b) and Definition 6.7.1 
of the exterior derivative, we see that 


1,V 2 )) 

dWp 


lim ~ f 
h-*Q k 2 Jq 


dP°(h$\Jrt 2 ) 


Wp 


6.8.12 


measures the work of F around the parallelogram spanned by Vi and V 2 (i.e., 
over its oriented boundary). If V\ and V 2 are unit vectors orthogonal to the 
axis of the probe and to each other, this work is approximately proportional to 
the torque to which the probe will be subjected. 

Theorems 6.7.7 and 6.8.3 have the following important consequence in R 3 : 

Uf is a C 2 function on an open subset U cl 3 , then curl grad / = 0. 

Therefore, in order for a vector field to be the gradient of a function, its curl 
must be zero. This may seem obvious in terms of a falling apple; gravity does 
not exert any torque and cause the apple to spin. In more complicated settings, 
it is less obvious; if yon observed the motions of stars in a galaxy, you might be 
tempted to think there was some curl, but there isn’t. (We will see in Section 
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6.11 that having curl zero does not quite guarantee that a vector field is the 
gradient of a function.) 


Geometric interpretation of the divergence 


The divergence is easier to interpret than the curl. If you put together the 
formula of Theorem 6.8.3(c) and Definition 6.7.1 of the exterior derivative^ 
we see that the divergence of F at a point x is proportional to the flux of F 
through the boundary of a small box around x, i.e. , the net flow out of the box. 
In particular, if the fluid is incompressible, the divergence of its velocity vector 
field is 0: exactly as much must flow in as out. Thus, the divergence measures 
the extent to which flow along the vector field changes the density. 

The Laplacian is arguably the A 8 ail1 ' Theorems 6.7.7 and 6.8.3 have the following consequence: 

most important differential opera- If F is a C 2 vector field on an open subset U C K 3 , then divcurl F = 0. 

tor in existence. In R 3 it is 


D? + D\ + Dj; 

it measures to what extent a graph 
is “tight.” It shows up in elec- 
tromagnetism, relativity, elastic- 
ity, complex analysis .... 


Remark. Theorem 6.7.7 says nothing about 

div grad /, grad div F, or curl curl F, 

which are also of interest (and which are not 0); they are three incarnations of 
the Laplacian. A 


6.9 The Generalized Stokes’s Theorem 


This theorem is also known as 
the generalized Stokes’s theorem, 
to distinguish it from the special 
case (surfaces in K 3 ) discussed in 
Section 6. 10. 

Names associated with the gen- 
eralized Stokes’s theorem include 
Poincare (1895), Volterra (1889), 
Brower (1906), and Elie Cartan, 
who formalized the theory of dif- 
ferential forms in the early 20th 
century. 


We worked pretty hard to define the exterior derivative, and now we are going 
to reap some rewards for our labor: we are going to see that there is a higher- 
dimensional analog of the fundamental theorem of calculus, Stokes’s theorem. 
It covers in one statement the four integral theorems of vector calculus, which 
are explored in detail in Section 6.10. 

Recall the fundamental theorem of calculus: 

Theorem 6.9.1 (Fundamental theorem of calculus). If f is a C l 
function on a neighborhood of [a, 6], then 

f'(t)dt = f(b) - /(a). 6.9.1 



Restate this as 



6.9.2 


i.e., the integral of df over an oriented interval is equal to the integral of / over 
the oriented boundary of the interval. In this form, the statement generalizes 
to higher dimensions: 
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Note that the dimensions in 
Equation 6.9.3 make sense: if X 
is ( k + l)-dimensional, dX is k- 
dimensional, and if p is a A: form, 
dp is a (k + l)-form, so dip can be 
integrated over X, and <p can be 
integrated over OX. 


Theorem 6.9.2 (Generalized Stokes’s theorem). Let X be a compact 
piece-with- boundary of a (k + l)-dimensional oriented manifold M C R n . 
Give the boundary dX of X the boundary orientation, and let ip be a k-form 
defined on a neighborhood of X. Then 

f <p = [ dtp. 6.9.3 

Jox Jx 


This is a wonderful theorem; it 
is probably the best tool mathe- 
maticians have for deducing global 
properties from local properties. 


The square S has side length 2, 
so its area is 4. 


This beautiful, short statement is the main result of the theory of forms. 


Example 6.9.3 (Integrating over the boundary of a square). You apply 
Stokes’s theorem every time you use anti-derivatives to compute an integral; to 
compute the integral of the 1-form f(x) dx over the oriented line segment [a, 6], 
you begin by finding a function g(x) such that dg(x) = f(x) dx, and then say 

f f(x) dx = f dg= [ g = g{b)~g(a). 6.9.4 

da ^(a.fe) Jd[a,b\ 

This isn’t quite the way it is usually used in higher dimensions, where “look- 
ing for anti-derivatives” has a different flavor. 

For instance, to compute the integral f c xdy-ydx , where C is the boundary 
of the square 5 described by the inequalities |x|,|j/| < 1, with the boundary 
orientation, one possibility is to parametrize the four sides of the square (being 
careful to get the orientations right), then to integrate xdy-ydx over all four 
sides and add. Another possibility is to apply Stokes’s theorem: 


j^xdy-ydx- j (dx A dy - dy A dx) = J 2dxAdy = S. 


6.9.5 


What is the integral over C of xdy + ydx? Check below. 10 


Example 6.9.4 (Integrating over the boundary of a cube). Let us 
integrate the 2-form 

ip = (x — y 2 + z 3 ) (dy A dz + dx A dz + dx A dy) 6.9.6 

over the boundary of the cube C a given by 0 < x,y,z < a. It is quite possible 
to do this directly, parametrizing all six faces of the cube, but Stokes’s theorem 
simplifies things substantially. 

Computing the exterior derivative of p> gives 

d<p — dx A dy Adz ~ 2 ydy Adx Adz + 3z 2 dz Adx Ady 
= (1 + 2 y + 3 z 2 ) dx Ady A dz , 

d(x dy + y dx — dx A dy 4- dy A dx — 0, so the integral is 0. 


6.9.7 
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Computing this exterior deriva- 
tive is less daunting if you are alert 
for terms that can be discarded. 
Denote ( x\ - x% + £3 - • • • ± x£) 
by /. Then D\f = dx 1, D2/ = 
— 2 x 2 dx 2 >D 3 f — 3x§ dx3 and so 
on, ending with inxJJ'' 1 dx„. For 
the first, the only term of 

n 

dx 1 A • • A dx t A ■ • • A dx„ 

t=i 

that survives is that in which i = 
1, giving 

dxi A dx 2 A • • • A dx„. 

For D 2 /, the only term of the sum 
that survives is dxxAdx^A- • A dx„, 
giving — 2x2Adx2Adxi Adx 3 A - • - A 
dx n ; when the order is corrected 
this gives 

2x2 A dx\ A dx 2 A ■ • ■ A dx n . 

In the end, all the terms are fol- 
lowed simply by dxi A • • • A dx„, 
and any minus signs have become 
plus. 


This parametrization is “obvi- 
ous” because x and y parametrize 
the top of the cube, and at the top, 
z = l. 


so 


/ 




I (1 + 2y + 3z 2 ) dx A dy A dz 

VCa 

m a 

/1 1 o-. 1 o-. 2 ' 


(1 + 2y 4- 3 z l )dxdydz 
= <* 2 ([*]<) + (y 2 lo + [^ 3 lo) = a 2 (a + a 2 + a 3 ), 


6.9.8 


A 


Example 6.9.5 (Stokes’s theorem: a harder example). Now let’s try 
something similar to Example 6.9.4, but harder, integrating 




= (xi - x\ + x| — • • • ± x”) [ ^ dx 1 A ■ • • A dxi A • ■ • A dx n 


6.9.9 


,:=1 


over the boundary of the cube C a given by 0 < xj < a,j = 1, . . . , n. 

This time, the idea of computing the integral directly is pretty awesome: 
parametrizing all 2n faces of the cube, etc. Doing it using Stokes’s theorem is 
also pretty awesome, but much more manageable. 

We know how to compute dtp, and it comes out to 

dtp = ( 1 -I- 2x2 + 3x| -I f- nx” -1 )dxi A 

The integral of jxj -1 dxi A • • • A dx n over C a is 


A dx n . 


6.9.10 


ra ra 

■ f 

Jo Jo 


|d"x| = a 


— oJ+n-1 


6.9.11 


so the whole integral is o n (l + a H 1- a n *). A 

The examples above bring out one unpleasant feature of Stokes’s theorem: it 
only relates the integral of a k - 1 form to the integral of a fc-form if the former 
is integrated over a boundary. It is often possible to skirt this difficulty, as in 
the example below. 

Example 6.9.6 (Integrating over faces of a cube). Let S be the union of 
the faces of the cube C given by -1 < x,y, z < 1 except the top face, oriented 

x 

by the outward pointing normal. What is f s $p, where F — y 

z 

The integral of over the^ whole boundary dC is by Stokes’s theorem the 
integral over C of d$^» = div F dxAdy A dz — ZdxAdy A dz, so 

I ~ / divFdx Ady Adz = 3 f dxAdyAdz = 24. 6.9.12 

Jdc Jc Jc 

Now we must subtract from that the integral over the top. Using the obvious 

gives 


parametrization ( f ^ ^ t ^ gr 
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The matrix in Equation 6.9.13 
is 


# K?))' Di 7 (O ,027 (O]' 



So the whole integral is 24 - 4 = 20. 


1 0 
0 1 
0 0 

A 


\dsdt\ - 4. 


6.9.13 


You could also argue that all 

faces must contribute the same Proof of the generalized Stokes’s theorem* 

amount to the flux, so the top 

must contribute 24/6 = 4. Before starting the proof of the generalized Stokes’s theorem, we want to sketch 

two proofs of the fundamental theorem of calculus, Theorem 6.9.1. You proba- 
bly saw the first in first-year calculus, but it is the other that will generalize to 
prove Stokes’s theorem. 



Computing the derivative of F. 


First proof of the fundamental theorem of calculus 

Set F(x) = J a x f(t) dt. We will show that 

F'(x) = /(*), 

as Figure 6.9.1 suggests. Indeed, 


F\x) = lim i y* +h }(t) dt - J* f(t) dt 


= lim 
/i— o 


l 

h 



dt = f(x). 


v hf(x) 


6.9.14 


6.9.15 



Figure 6.9.2. 

A Riemann sum as an approxi- 
mation to the integral in Equation 
6.9.18. 


(The last integral is approximately hf(x)\ the error disappears in the limit.) 
Now consider the function 



with deriv. f'{x) 


6.9.16 


The argument above shows that its derivative is zero, so it is constant; evalu- 
ating the function at x = a, we see that the constant is f(a). Thus 



f'(t)dt = f(a). 


□ 


6.9.17 


Second proof of the fundamental theorem of calculus. 

Here the appropriate drawing is the Riemann sum drawing of Figure 6.9.2. 
By the very definition of the integral, 
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You may take your pick as to 
which proof you prefer in the one- 
dimensional case but only the sec- 
ond proof generalizes well to a 
proof of the generalized Stokes’s 
theorem. In fact, the proofs are 
almost identical. 



Figure 6.9.3. 


Although the staircase is very 
close to the curve, its length is not 
close to the length of the curve, 
i.e., the curve does not fit well with 
a dyadic decomposition. In this 
case the informal proof of Stokes’s 
theorem is not enough. 


We get the last equality in 
Equation 6.9.22 because the 
length of a little interval 2 ,+i - x x 
is precisely the original interval 
b — a divided into m pieces. 


rb 

/ f(x)dx S3 ^/(x t )(x t+ l - X,), 

Ja ; 


6.9.18 


where x 0 < x^ < • • • < x m decompose [a, 6) into m little pieces, with a - x o 

and b — x ^ . 

By Taylor’s theorem, 


f{x i+ 1 ) « /(x,) + f'{Xi)(x i+ 1 - X;). 


6.9.19 


These two statements together give 

f{x)dx « - Xi) zz^2f{x i+ 1 ) - f{Xi). 

In the far right-hand term all the interior x*’ s cancel: 



6.9.20 


m- 1 

^2 /( x H-i) “ /( x 0 = /( x 0 “ f(xo) + f(* 2 ) - f{xi) 4 1- f(x m ) — /(x m -l), 

1=0 

6.9.21 

leaving f(x m ) - /(x 0 ), i.e., f(b) - f(a). 

Let us analyze a little more closely the errors we are making at each step; 
we are adding more and more terms together as the partition becomes finer, so 
the errors had better be getting smaller faster, or they will not disappear in the 
limit. Suppose we have decomposed the interval into m pieces. Then when we 
replace the integral in Equation 6.9.20 by the first sum, we are making m errors, 
each bounded as follows. The first equality uses the fact that A(b — a) = A. 


A b— a 

A 

rX\+ 1 v 

\ f'{x) dx- f'(xi)(xi+i - Xi) = 

1 Jx, 

/ (/'(*) - /'(*()) dx 

Jx x 


< 



sup I/" | (x - x i)dx 


= sup 



(x - x,) dx 


6.9.22 



= sup |/"i 


~ a ) 2 

2m 2 


We also need to remember the error term from Taylor’s theorem, Equation 
6.9.19, which turns out to be about the same. So all in all, we made m errors, 
each of which is < Ci/m 2 , where C\ is a constant that does not depend on m. 
Multiplying that maximal error for each piece by the number m of pieces leaves 
an m in the denominator, and a constant in the numerator, so the error tends 
to 0 as the decompositions becomes finer and finer. □ 
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We find this argument convinc- 
ing, but it is not quite rigorous. 
For a rigorous proof, see Appendix 
A. 22. The problem with this infor- 
mal argument is that the bound- 
ary of X does not necessarily fit 
well with the boundaries of the lit- 
tle cubes, as illustrated by Figure 
6.9.3. 


R*5 fE 



| of <p 

Figure 6.9.4. 

The integral of dp over U~ 
equals the integral of <p over the 
boundary of U-\ we will see in 
Equation 6.9.34 that this is equal 
to the integral of p over E. 

In this case, the easy proof 
works because the boundary of X 
fits perfectly with the boundary of 
the dyadic cubes. 


An informal proof of Stokes’s theorem 

Suppose you decompose X into little pieces that are approximated by oriented 
(k 4- 1)- parallelograms P°: 


?i = /? i (vi l nv 2 .i,... 1 v fc+ i ti ). 


6.9.23 


Then 

f dp « y^dviP?) « ^ « f <P- 6.9.24 

Jx i i JdP° Jdx 

The first approximate sign is just the definition of the integral; the « becomes 
an equality in the limit as the decomposition becomes infinitely fine. The second 
approximate sign comes from our definition of the exterior derivative 

When we add over all the P?, all the internal boundaries cancel, leaving 

fax 

As in the case of Riemann sums, we need to understand the errors that are 
signaled by our « signs. If our parallelograms P° have side e, then there are 
approximately such parallelograms. The errors in the first and second 

replacements are of order e k+2 . For the first, it is our definition of the integral, 
and the error becomes small as the decomposition becomes infinitely fine. For 
the second, from the definition of the exterior derivative 

dp(P°) - I p + terms of order (k + 2), 6.9.25 

JdP? 

so indeed the errors disappear in the limit. □ 


A situation where the easy proof works 

We will now describe a situation where the proof in Section 6.9 really does work. 
In this simple case, we have a (k - l)-form in 1R*, and the boundary of the piece 
we will integrate over is simply the subspace E C IR* of equation Xi = 0. There 
are no manifolds; nothing curvy. Figure 6.9.4 illustrates Proposition 6.9.7. 


Proposition 6.9.7. Let U be a bounded open subset of 1R*, and let U- be 
the subset of U where the first coordinate is non-positive (i.e., xi < 0J. Give 
U the standard orientation ofR k (by detj, and give the boundary orientation 
to dU- = U n E. Let tp be a (k — 1 )-form on IR* of class C 2 , which vanishes 
identically outside U. Then 



6.9.26 


Proof. We will repeat the informal proof above, being a bit more careful about 
the bounds. Choose € > 0, and denote by IR^ the subset of IR* where xj > 0. 
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Recall from the proof of Theorem 6.7.3 (Equation A20.15) that there exists 11 
a constant K and 6 > 0 such that when |/i| < <J, 


When we evaluate dp on C 
in Equation 6.9.28, we are think- 
ing of C as an oriented parallel- 
ogram, anchored at its lower left- 
hand corner. 


dp(PZ{he u - • ,he*)) - f 

Jd 






< Kh k+l . 


6.9.27 


That is why we required p to be of class C 2 , so that the second derivatives 
of the coefficients of p are bounded. Take the dyadic decomposition 2?jv(® fc )* 
where h — 2~ N . By taking N sufficiently large, we can guarantee that the 
difference between the integral of dip over £/_ and the Riemann sum is less than 
6 / 2 : 



6.9.28 


Now we replace the fc-parallelograms of Equation 6.9.27 by dyadic cubes, 
and evaluate the total difference between the exterior derivative of p over the 
cubes C, and p over the boundaries of the C. The number of cubes of 2 ?n(R!L) 
that intersect the support of p is at most L2 kN for some constant L, and since 
h = 2~ N , the bound for each error is now K2~ N ( k+l \ so 

| £ MC)~ 22 / H- ^£2 -*2~* ( * +1) - = £ft ,2 ~ N - 

C€X>n( 1*) No. of cubee bound for 

each error 

6.9.29 

This can also be made < e/2 by taking N sufficiently large — to be precise, 
by taking 


N > log 2LK — log e 
log 2 

Putting these inequalities together, we get 


6.9.30 


One important advantage of al- 
lowing boundaries to have corners, 
rather than requiring that they be 
smooth, is that cubes have cor- 
ners. Thus they are assumed un- 
der the general theory, and do not 
require separate treatment. 


<e/2 


<«/2 


I L dt f- 22 mo \ + 1 22 m°) - 22 I v 

1 r(B fc ) r'c-n. /yak \ ,*,k\''9C 


C€X> W (K1) 


C€*M*!L) 


so in particular, when N is sufficiently large we have 


\ dtp- T I <p\ <e, 

^ Ul csv»( my 90 


Finally, all the internal boundaries in the stun 


< e, 

6.9.31 

6.9.32 


22 f v 

csv N {*i) Jdc 


6.9.33 


11 The constant in Equation A20.25 (there called C, not K), comes from Taylor’s 
theorem with remainder, and involves the suprema of the second derivatives. 
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Of course forms can be inte- 
grated only over oriented domains, 
so the E in the third term of Equa- 
tion 6.9.34 must be oriented. But 
E is really M*" 1 , with coordinates 
X 2 , . . . in, and the boundary orien- 
tation of Ml is the standard orien- 
tation of M* -1 . In Figure 6.9.4, it 
is shown as the line oriented from 
bottom to top. 


6 . 


Using a parametrization, The- 
orem 6.10.1 can easily be reduced 
to the ordinary fundamental the- 
orem of calculus, Theorem 6.9.1, 
which it is if n = 1. 

We could also call this the fun- 
damental theorem for integrals 
over curves; “line integrals” is 
more traditional. 


Yes, we do need both bounded's 
in Theorem 6.10.2. The exterior 
of the unit disk is bounded by the 
unit circle, but is not bounded. 


cancel, since each appears twice with opposite orientations. The only bound- 
aries that count are those in R* - * 1 . So (using C" to denote cubes of the dyadic 
composition of M* -1 ) 


J2 ( v = H [*=[*=[ V- 

C^»i) JdC C-€V,(E) Jc ' Je Jw - 


6.9.34 


(We get the last equality because <p vanishes identically outside U , and therefore 
outside U\.) So 



<P 


< e. 


Since c is arbitrary, the proposition follows. □ 


6.9.35 


10 The Integral Theorems of Vector Calculus 


The four forms of the generalized Stokes’s theorem that make sense in M 2 
and R 3 don’t say anything that is not contained in that theorem, but each is 
of great importance in many applications; these theorems should all become 
personal friends, or at least acquaintances. They are used everywhere in elec- 
tromagnetism, fluid mechanics, and many other fields. 

Theorem 6.10.1 ( Fundamental theorem for line integrals). Let C 

be an oriented curve in R 2 or R 3 for for that matter any R n ), with oriented 
boundary (P£ — P£), and let f be a function defined on a neighborhood of 
C. Then 

[ = /(b) - /(a). 6.10.1 

Jc 


Green’s theorem and Stokes’s theorem 

Green’s theorem is the special case of Stokes’s theorem for surface integrals 
when the surface is flat. 

Theorem 6.10.2 (Green’s theorem). Let S be a bounded region of 
R 2 , bounded by a curve C (or several curves Ci), carrying the boundary 
orientation as described in Definition 6.6.12. Let P be a vector field defined 
on a neighborhood of S. Then 

L dW ? = f c w >• or 1^=52 1 


6. 10.2 
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There is a good deal of conten- 
tion as to who should get credit for 
these important results. The Rus- 
sians attribute them to Michael 
Ostrogradski, who presented them 
to the St. Petersburg Academy of 
Sciences in 1828. Green published 
his paper, privately, in 1828, but 
his result was largely overlooked 
until Lord Kelvin rediscovered it 
in 1846. Stokes proved Stokes’s 
theorem, which he asked on an ex- 
amination in Cambridge in 1854. 
Gauss proved the divergence the- 
orem, also known as Gauss’s the- 
orem. 


The curve C in Theorem 6.10.4 
may well consist of several pieces 
Ci. 


This is traditionally written 

j (D\g - D 2 f)dxdy - j fdx + gdy. 6.10.3 

To see that the two versions are the same, write Wp = / dx + 0 (y) dy 
and use Theorem 6.7.3 to compute its exterior derivative: 


dW p — d(f dx + gdy) = df Adx + dg Ady 

= (D\f dx -l- D 2 fdy) Adx + (D\gdx + D 2 gdy) A dy. 
= Dif dy Adx + D\gdx Ady — (Dig - £> 2 /) dx A dy. 


6.10.4 


Example 6.10.3 (Green’s theorem). What is the integral 

2xydy + x 2 dx, 


j 

JdD 


6.10.5 


where U is the part of the disk of radius R centered at the origin where y > 0, 
with the standard orientation? 

This corresponds to Green’s theorem, with / (y) = x 2 and g = 2xy, 
so that Dig = 2y and £> 2 / — 0. Green’s theorem says 


[ 2xydy + x 2 dx = [ (Dig - D 2 f)dxdy = / 2ydxdy 
Jou Ju Ju 

f* O I?3 fir 

= I / (2r sin 6) r dr d0 = / si 
Jo Jo 3 Jo 


sin Odd = 


6 . 10.6 

4 R 3 


What happens if we integrate over the boundary of the entire disk? 12 


Theorem 6.10.4 (Stokes’s theorem). Let S be an oriented surface in 
M 3 , bounded by a curve C that is given the botmdary orientation . Let <p be 
a 1-form field defined on a neighborhood of S. Then 




6.10.7 


Again, let’s translate this into classical notation. First, and without loss of 
generality, we can write <p = Wp , so that Theorem 6.10.4 becomes 






6.10.8 


12 It is 0, by symmetry: the integral of 2 y over the top semi-disk cancels the integral 
over the bottom semi-disk. 
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The N\d 2 x\ in the left-hand 
side of Equation 6.10.9 takes the 
parallelogram P x (v,w) and re- 
turns the vector 

7V(x))v x w|, 

since the integrand |d 2 x| is the 
element of area; given a paral- 
lelogram, it returns its area, i.e., 
the length of the cross-product 
of its sides. When integrating 
over S, the only parallelograms 
P x (v, w) we will evaluate the in- 
tegrand on are tangent to S at x, 
and with compatible orientation, 
so that vxwisa multiple of 7V(x), 
in fact 

v x w = |v x w|7V(x), 

since N(x) is a vector of unit 
length and perpendicular to the 
surface. So 

curlF(x) • N(x) |d 2 x| 

= curl P*(x)) • (vj x tf 2 ) 

— det[curl F(x), Vi , tf 2 ], 

i.e., the flux of a vector field F 
acting on vj and v 2 . 


Exercise 6.5.1 shows that for 
appropriate curves, orienting by 
decreasing polar angle means that 
the curve is oriented clockwise. 


This still isn’t the classical notation. Let N be the normal unit vector field 
on S defining the orientation, and f be the unit vector field on the C, defining 
the orientation there. Then 


JJ (curlF(x)) 


N(x) | ci 2 x 


-U 


F(x)f(x)\d'x\. 


6.10.9 


The left-hand side of Equation 6.10.9 is discussed in the margin. Here let’s 
compare the right-hand sides of Equations 6.10.8 and 6.10.9. Let us set F = 

fFil 


F 2 


In the right-hand side of Equation 6.10.8, the integrand is W p. = F\ dx+ 


LF.J 

F‘2 dy + F$dz\ given a vector v, it returns the number F\V\ + F 2 v 2 + F3V3. 

In Equation 6.10.9, T(x) |cf 1 x| is a complicated way of expressing the identity: 
given a vector v, it returns T(x) times the length of v. Since T(x) is a unit 
vector, the result is a vector with length |v|, tangent to the curve. When 
integrating, we are only going to evaluate the integrand on vectors tangent to 
the curve and pointing in the direction of T, so this process just takes such a 
vector and returns precisely the same vector. So F(x) T(x) lof 1 x| takes a vector 
v and returns the number 


(F(x) f(x) \d x x\)(v) = 

' 

yf 


’Fx 


Vl" 

f 2 

• 

v 2 

f 3 


. v 3. 


= F 1 V 1 +F 2 v 2 + F 3 V3 = Wp(v). 6.10.10 


Example 6.10.5 (Stokes’s theorem). Let C be the intersection of the 
cylinder of equation x 2 + y 2 = 1 with the surface of equation z = sin xy + 2. 
Orient C so that the polar angle decreases along C. What is the work over C 
of the vector field 



6.10.11 


It’s not so obvious how to visualize C , much less integrate over it. Stokes’s the- 
orem says there is an easier approach: compute the integral over the subsurface 
S consisting of the cylinder x 2 + y 2 = 1 bounded at the top by C and at the 
bottom by the unit circle C\ in the (x, y)-plane, oriented counterclockwise. 

By Stokes’s theorem, the integral over C plus the integral over C\ equals 
the integral over S , so rather than integrate over the irregular curve C, we will 
integrate over S and then subtract the integral over C \ . First we integrate over 
S: 


L W '*L. 


0 

0 

1 ~3j/ a 


= 0. 


6.10.12 
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Since C is oriented clockwise, 
and Ci is oriented counterclock- 
wise, C 4- Ci form the oriented 
boundary of S. If you walk on 
5 along C, in the clockwise di- 
rection, with your head pointing 
away from the z-axis, the surface 
is to your left; if you do the same 
along Ci, counterclockwise, the 
surface is still to your left. 

What if both curves were ori- 
ented clockwise? Denote by these 
curves by C + and C * , and denote 
by C~ and Cf the curves oriented 
counterclockwise. Then (leaving 
out the integrands to simplify no- 
tation) we would have 

/ H ~ f )== /’ 

Jc+ Jet Js 


but 


Jc* Ic- ' 


so f c + Wp remains unchanged. 

If both were oriented counter- 
clockwise, so that C did not have 
the boundary orientation of 5, we 
would have 

■/ c - + / c ,- = / 5 = 0; 


instead of 


I "> = -/ 

Jc+ Jc , - 


r c+ 
we have 


c- 


L w ’'L w '-l- 


This last equality comes from the fact that the vector field is vertical, and has 
no flow through the vertical cylinder. Finally parametrize Ci in the obvious 
way; 


t 


cos t 
sint 


6.10.13 


which is compatible with the counterclockwise orientation of C\, and compute 

(sin t) 3 


j m >=/T 8 c2 

JCi Jo cosl 


- sinf 
cost 


dt 


J r* 3 7 

' (-sint) 4 4-COS 2 tdt = -7T + 7T = -7T. 

o 4 4 


6.10.14 


So the work over C is 


L 




6.10.15 


The divergence theorem 

The divergence theorem is also known as Gauss’s theorem. 

Theorem 6.10.6 (The divergence theorem). Let M be a bounded 
domain in M 3 with the standard orientation of space, and let its boundary 
dM be a union of surfaces Si, each oriented by the outward normal. Let <p 
be a 2-form held defined on a neighborhood of M. Then 


f <v=£/ + 

Jm i Jst 


6.10.16 


Again, let’s make this look a bit more classical. Write ip = so that 

(fp = d$p = p dlv p, and let N be the unit outward-pointing vector field on the 
Sp, then Equation 6.10.16 can be rewritten 


JJj div F dxdydz = j t F • /V|d 2 

M i S, 


6.10.17 


When we discussed Stokes’s theorem, we saw that F • N, evaluated on a 
parallelogram tangent to the surface, is the same thing as the flux of F evaluated 
on the same parallelogram. So indeed Equation 6.10.17 is the same as 


I M d * f j M r X'fs.**' 


6.10.18 


Remark. We think Equations 6.10.9 and 6.10.17 are a good reason to avoid 
the classical notation. For one thing, they bring in N, which will usually involve 
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dividing by the square root of the length; this is messy, and also unnecessary, 
since the (c^x) term will cancel with the denominator. More seriously, the 
classical notation hides the resemblance of this special Stokes’s theorem and 
the divergence theorem to the general one, Theorem 6.9.2. On the other hand, 
the classical notation has a geometric immediacy that really speaks to people 
who are used to it. A 


Example 6.10.7 (Divergence theorem). Let Q be the unit cube. What is 


the flux of the vector field 


o 

x*y 
-2 yz 

*V 


through the boundary of Q if Q carries 


the standard orientation of R 3 and the boundary has the boundary orientation? 
The divergence theorem asserts that 


/ * 
JdQ 


x*y 
-2 yz 

*v 


= f p r x 'y 1 = f ( 2 *y - 2z ) l<* 3 4 

div - 2y J 


6.10.19 


*V 


This can readily be computed by Fubini’s theorem: 

m (2xy — 2z) dxdydz = ^ - 1 = A 

2 2 


6.10.20 


Example 6.10.8 (The principle of Archimedes). Archimedes is said to 
have been asked by Creon, the tyrant of Syracuse, to determine whether his 
crown was really made of gold. Archimedes discovered that by weighing the 
crown when suspended in water, he could determine whether or not it was coun- 
terfeit. According to legend, he made the discovery in the bath, and proceeded 
to run naked through the streets, crying “Eureka” (‘7 have found it”). 

The principle he claimed is the following: A body immersed in a fluid receives 
a buoyant force equal to the weight of the displaced fluid. 

We do not understand how he came to this conclusion, and the derivation 

we will give of the result uses mathematics that was certainly not available to 
Archimedes. 

The force the fluid exerts on the immersed body is due to pressure. Suppose 
that the body is M , with boundary dM made up of little oriented parallelograms 
Pf . The fluid exerts a force approximately 

p(x<)Area(/?)iii, 6.10.21 

where n is an inner pointing unit vector perpendicular to Pf and x* is a point 
of Pf\ this becomes a better and better approximation as Pf becomes small 
so that the pressure on it becomes approximately constant. The total force 

exerted by the fluid is the sum of the forces exerted on all the little pieces of 
the boundary. 

Thus the force is naturally a surface integral, and in fact is really an integral 
of a 2-form field, since the orientation of dM matters. But we can’t think of it 
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How did Archimedes find this 
result without the divergence the- 
orem? He may have thought of 
the body as made up of little 
cubes, perhaps separated by little 
sheets of water. Then the force ex- 
erted on the body is the sum of 
the forces exerted on all the little 
cubes. Archimedes’s law is easy to 
see for one cuhe of side s, where 
the vertical component of top of 
the cube is z , which is a negative 
number (z = 0 is the surface of the 
water) . 

The lateral forces obviously 
cancel, and force on the top is ver- 
tical, of magnitude s 2 gg.z, and the 
force on the hottom is also verti- 
cal, of magnitude -s 2 gg,(z - s), so 
the total force is which is 

precisely the weight of a cube of 
the fluid of side s. 

If a body is made of lots of little 
cubes separated by sheets of wa- 
ter, all the forces on the interior 
walls cancel, so it doesn’t matter 
whether the sheets of water are 
there or not, and the total force 
on the body is buoyant, of magni- 
tude equal to the weight of the dis- 
placed fluid. Note how similar this 
ad hoc argument is to the proof of 
Stokes’s theorem. 


as a single 2-form field: the force has three components, and we have to think 
of each of them as a 2-form field. In fact, the force is 


fdM i 

fdM 2 * 

. fdM P^e 3 - 

since 


P^e, 


detjei.vi, v 2 ]' 

P^e-, 

(f£(vi,v 2 )) =p(x) 

det[e 2 ,Vi,v 2 j 

.P$e, . 


_det[e 3 ,vi,v 2 j_ 


6,10.22 


6.10.23 


= p(x)(v 1 x v 2 ) = p(x)Area(P®(v 1 ,v 2 ))n. 

In an incompressible fluid on the surface of the earth, the pressure is of the form 
p(x) = -ppz, where p is the density, and g is the gravitational constant. Thus 
the divergence theorem tells us that if dM is oriented in the standard way, i.e., 
by the outward normal, then 



JdM ' 


Jm Pv (ngzei ) 

Total force = 

Jbm wzt’e. 

= 

Im P^ (ngze 3 ) 


. SdM . 


-Jm P^-ingzSz)- 


The divergences are: 

V • (ppzei) = V • (ppze 2 ) = 0 and V • (ppze 3 ) = pp. 6.10.25 


Thus the total force is 



6.10.26 


and the third component is the weight of the displaced fluid; the force is oriented 
upwards. 

This proves the Archimedes principle. A 


6.11 Potentials 


A very important question that constantly comes up in physics is: when is a 
vector field conservative? The gravitational vector field is conservative: if you 
climb from sea level to an altitude of 500 meters by bicycle and then return to 
your starting point, the total work against gravity is zero, whatever your actual 
path. Friction is not conservative, which is why you actually get tired during 
such a trip. 

A very important question that constantly comes up in geometry is: when 
does a space have a “hole” in it? 

We will see in this section that these two questions are closely related. 



6.11 Potentials 569 


Another way of stating inde- 
pendence of path is to require that 
the work around any closed path 
be zero; if 71 and 72 are two paths 
from x to y, then 71-72 is a closed 
loop. Requiring that the integral 
around it be zero is the same as 
requiring that the works along 71 
and 72 be equal. It should be clear 
why under these conditions the 
vector field is called conservative. 


Why obvious? We are trying to 
undo a gradient, i.e., a derivative, 
so it is natural to integrate. 


Remember, if / is a function on 
R 3 and F is a vector field, then 
df = So if we show that 

df — Wpt y we will have shown 
that F — V/, i.e., that F is the 
gradient of the function /. 


Conservative vector fields and their potentials 

Asking whether a vector field is conservative is equivalent to asking whether it 
is the gradient of a function. 


Theorem 6 . 11 . 1 . A vector field is the gradient of a function if and only if 
it is conservative: i.e., if and only if the work of the vector field along any 
path depends only on the endpoints, and not on the oriented path joining 
them. 


Proof. Suppose F is the gradient of a function /: F = V/. Then by Theorem 
6.9.2, for any parametrized path 

7 : [a, 6] — * R n 6.11.1 

we have (Theorem 6.10.1) 

[ W V/ = /(•#)) -/( 7 (a)). 6.11.2 

A(imd 

Clearly, the work of a vector field that is the gradient of a function depends 
only on the endpoints: the path taken between those points doesn’t matter. 

It is a bit harder to show that path independence implies that the vector 
field is the gradient of a function. First we need to find a candidate for the 
function /, and there is an obvious choice: choose any point xo in the domain 
of F, and define 

/(x) = [ 6.11.3 

A'(x) 

where 7 (x) is an arbitrary path from Xo to x: our independence of path con- 
dition guarantees that the choice does not matter. 

Now we have to see that F — V/, or alternatively that Wp = df. We know 
that 


4T(*?W) = to" J(/(* + W) - /(*)), 6.11.4 

and (remembering the definition of / in Equation 6.11.3) /(x - f- hx) - /(x) is 
the work of F first from x back to x 0 , then from x 0 to x + hx. By independence 
of path, we may replace this by the work from x to x + hx along the straight 
line. Parametrize the segment in the obvious way (by 7 : t •-» x + tx, with 
0 < t < h) to get 


df (PJ (v)) = hm - ^ F(x + tx) • v dt ^ = F(x) • v, 


6.11.5 


F( 7 ( 0 )-V (0 


i.e., df = Wp. □ 
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Theorem 6.11.1 provides one 
answer, but it isn’t clear how to 
use it; it would mean checking the 
integral along all closed paths. (Of 
course you definitely can use it to 
show that a vector field is not a 
gradient: if you can find one closed 
path along which the work of a 
vector field is not 0, then the vec- 
tor field is definitely not a gradi- 
ent. 


Definition 6.11.2 (Potential). A function / such that grad / = F is called 
a potential of F. 

A vector field has more than one potential, but pretty clearly, two such 
potentials / and g differ by a constant, since 

grad(/ - g) = grad/ - grad# = F - F = 0; 6.11.6 

the only functions with gradient 0 are the constants. 

So when does a vector field have a potential, and how do we find it? The 
first question turns out to be less straightforward than might appear. There 
is a necessary condition: in order for a vector field F to be the gradient of a 
function, it must satisfy 

curl F = 0. 6.11.7 

This follows immediately from Theorem 6.7.7: ddf = 0. Since df = W^p then 
if F = V/, 

dW p — ^curi f = d df ~ °> 6.11.8 

the flux of the curl of F can be 0 only if the curl is 0. 

Some textbooks declare this condition to be sufficient also, but this is not 
true, as the following example shows. 


Example 6.11.3 (Necessary but not sufficient). Consider the vector field 

1 


F = 

x 2 + y 2 

on R 3 with the 2 -axis removed. Then 


-y 

x 

0 


curl F= 


0 
0 

1 x*+ y 2 + y 2 J 


D x 


and the third entry gives 


(x 2 + y 2 ) - 2x 2 (x 2 4- y 2 ) — 2y 2 

» / o . o \ o — 


(x 2 4- y 2 ) 2 


(x 2 + y 2 ) 2 


6.11.9 


6.11.10 


6 . 11.11 


But F cannot be written V/ for any function / : (K 3 - 2 -axis) — ► R. Indeed, 
using the standard parametrization 



6.11.12 
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Recall (Equation 5.6.1) that 
the formula for integrating a work 
form over an oriented surface is 

j W f = J* p(-,{t)) - V(t) dt. 

The unit circle is often denoted 
S 1 . 


the work of F around the unit circle oriented counterclockwise gives 

i 


dt — 2n. 6.11.13 


?h(t)) V(0 

This cannot occur for work of a conservative vector field: we started at one 
point and returned to the same point, so if the vector field were conservative, 
the work would be zero. 

We will now play devil’s advocate. We claim 

P = V (arctan ; 6.11.14 

and will leave the checking to you as Exercise 6.11.1. Why doesn’t this con- 
tradict the statement above, that F cannot be written ^/? The answer is 
that 



arctan- 6.11.15 

x 

is not a function , or at least, it cannot be defined as a continuous function on 
U 3 minus the z-axis. Indeed, it really is the polar angle 8, and the polar angle 
cannot be defined on K 3 minus the z-axis; if you take a walk counterclockwise 
on a closed path around the origin, taking your polar angle with you, when you 
get back where you started your angle will have increased by 27 t. A 


A pond is convex if you can 
swim in a straight line from any 
point of the pond to any other. 
A pond with an island is never 
convex. 


Example 6.11.3 shows exactly what is going wrong. There isn’t any problem 
with F, the problem is with the domain. We can expect trouble any time we 
have a domain with holes in it (the hole in this case being the z-axis, since F 
is not defined there). The function / such that V/ = F is determined only up 
to an additive constant, and if you go around the hole, there is no reason to 
think that you will not add on a constant in the process. So to get a converse 
to Equation 6.11.7, we need to restrict our domains to domains without holes. 
This is a bit complicated to define, so instead we will restrict them to convex 
domains. 

Definition 6.11.4 (Convex domain). A domain U C R n is convex if for 
any two points x and y of U, the straight line segment (x, y] joining x to y 
lies entirely in U. 


Theorem 6.11.5. IfU C R 3 is convex, and P is a vector field on U } then 
F is the gradient of a function f defined on U if and only if curl F = 0. 



We have been considering the 
question, when is a 1-form (vector 
field) the exterior derivative (grar 
dient) of a O-form (function)? The 
Poincare lemma addresses the 
general question, when is a A:- form 
the exterior derivative of a (k- 1)- 
form? In the case of a 2-form on 
R 4 , this question is of central im- 
portance for understanding elec- 
tromagnetism. The 2-form 

Wg A cdt 4- 

where E is the electric field and B 
is the magnetic field, is the force 
field of electromagnetism, known 
as the Faraday. 

The statement that the Fara- 
day is the exterior derivative of 
a 1-form ensures that the electro- 
magnetic potential exists; it is the 
1-form whose exterior derivative is 
the Faraday. 

Unlike the gravitational poten- 
tial, the electromagnetic potential 
is not unique up to an additive 
constant. Different 1-forms exist 
such that their exterior derivative 
is the Faraday. The choice of 1- 
form is called the choice of gauge; 
gauge theory is one of the domi- 
nant ideas of modern physics. 
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Proof. The proof is very similar to the proof of Theorem 6.11.1. First we need 
to find a candidate for a function /, and there is again an “obvious” choice. 

Choose a point xo £ U, and set 

6.11.16 



where this time 7 (x) is specifically the straight line joining xo to x. Note that 
this is possible because U is convex; if U were a pond with an island, the straight 
line might go through the island (where the vector field is undefined). 

Now we need to show that V/ = F. Again, 

V /(x) • v as lim -j:(/(x + hv) - /(x)), 6.11.17 

' h—»o h v 


and /(x + hv) - /(x) is the work of F along the path that goes straight from 
x to xo and then straight on to x + hv. We wish to replace this by the path 
that goes straight from x to x 4- hv. We don’t have path independence to allow 
this, but we can do it by Stokes’s theorem. Indeed, the three oriented segments 
(x, xo], [x 0 , x + hv], and (x + hv, x] together bound a triangle T , so the work of 
F around the triangle is equal to zero: 

f Wf= [ dW f = f <f curl/ = 0. 6.11.18 

JdT JT JT 

We can now rewrite Equation 6.11.17; 


V/(x) ■ v 


= Hm U f 

h~*o h \y(x,x 0 j 


W f + 


-/(x) 


[XcX+htf) 

/( X +/ Itf ) 


u. 


*0 h J[x t x+hV] 


6.11.19 


The proof finishes as above (Equation 6.11.5). □ 


Example 6.11.6 (Finding the potential of a vector field). Let us carry 
out the computation in the proof above in one specific case. Consider the vector 


field 



y 2 / 2 + yz' 
x(y + z) 
xy 


6 . 11.20 


whose curl is indeed 0: 

D\ y 2 / 2 + yz x - x 0 

VxF = D 2 x x(y + z) = -y + y = 0 6.11.21 

D 3 xy y + z-(y + z )_ 0 

Since F is defined on all of R 3 , which is certainly convex, Theorem 6.11.5 asserts 
that F = V/, where 

/(a) = f Wp, for 7 a (f) = fa, 0 < t < 1, 

J la 


6.11.22 
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i.e., 7 a is a parametrization of the segment joining 0 to a. If we set a — 
this leads to 



1 



*3 

— (3a6 2 /2 + 3a6) — ab 2 / 2 4- abc. 

. 3 Jo 


6.11.23 


This means t hat 


/ 



= — + x v z > 


6.11.24 


and it is easy to check that V/ = F. 
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Exercises for Section 6.1: 

Forms as Integrands 
Over Oriented Domains 

In Exercise 6.1.1, parts (f), (g), 
(h), / is a C 1 function on a neigh- 
borhood of (0, 1]. 


6 . 1.1 An integrand should take a piece of the domain, and return a number, 
in such a way that if we decompose a domain into little pieces, evaluate the 
integrand on the pieces and add, the sums should have a limit as the decom- 
position becomes infinitely fine. What will happen if we break up [0, 1) into 
intervals fo, x i+1 j, for i = 0, 1, . . . , n - 1, with 0 = x 0 < Xj < • < x n = 1, 

and assign one of the numbers below to each of the (x*,x»+i)? 


(a) |xt+ , x% | 2 

( d ) |(x<+i) 2 “(**) 2 l 
(g) l/((z. + ,) 2 )-/(x?)| 


(b) sin |ii - Xj+i| 

(e) |(ij+i ) 3 - (ij) 3 | 

(h) |(/(x< +I )) 2 -(/(x i )) 2 | 


(c) \/|Xi X*+l f 

(0 l/(Zi+l) ~ /(Xj)| 

(i) |x 1+1 Xj| log |xj+i — Xj| 


Exercises for Section 6.2: 
Forms on 3£ n 


6 . 1.2 Same exercise as 6.1.1 but in K 2 , the integrand to be integrated over 
[0, l] 2 ; the integrand takes a rectangle a < x < b, /c < y < d and returns the 
number 

(a) |6 - a\ 2 y/\c~ d\ (b) |oc - bd\ (c) (ad -be) 2 

6.2.1 Complete the proof of Proposition 6.2.11. 


6 . 2.2 Compute the following numbers: 



(b) e x dy 



(a) dx.3 A dx 2 
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Exercises for Section 6.3: 

Integration Over 
Parametrized Domains 


( 


(c) x\ dx 3 A dx 2 A dx\ 


( 


\ 


P° 
( 2 \ 
0 
0 

Vo/ 


ri 
2 
3 

V L4j 


r o 
1 

-1 


r l 

-l 

-l 


L iJ L oJ/ 


\ 


/ 


6.2.3 Compute the following functions: 

( 


\ 


(a) sin(x 4 ) dx 3 A dx 2 


P° 
/*»\ 
*2 

*3 

\x 4 / 


V 


rn 

2 

3 


r 01 
1 

-1 


L4J L iJ 


) 


( b ) e z dy | \ (2) 


/ 


\ 


(c) xfe x 3 dx 3 A dx 2 A dx\ 


\ 


/ 

■v 


■ 0 - 


' 1- 

\ 

. 1 

2 


1 


-1 


(-*'\ 

3 


-1 


-1 


** \ 

.4. 


. 1. 


. 0 . 

/ 

-X 3 I 





V X 4 J 






) 


6.2.4 Prove Proposition 6.2.16. 


6.2.5 Verify that Example 6.2.14 does not commute, and that Example 6.2.15 
does. 

6.3.1 Set up each of the following integrals of form fields over parametrized 
domains as an ordinary multiple integral, and compute it. 

( sin£ \ 
cos t I . 

(b) Iy(u) xd y Adz * where u = [-Ml x [-1,1]. and 7^) = + v j . 


(c) 


■Ar(V) i, dx 2 Adx 3 + x 2 dx 3 Adx 4 , where l/={(“)|o<u, V;U + t ;<2j, 


uv 

.2 . ..2 


and7 («) = I u u !r 

log(tl + V + 1) 


(d) /y(C7) x 2 dx\ Adx 3 Adx 4 , where U — ( [ v 


0 < u, 1/, u;; u + v + w < 3 > , 
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Exercises for Section 6 . 4 : 
Form Fields and Vector Calculus 


u 

and 7 ( v | = 

w 



6.3.2 Set up each of the following integrals of form fields over parametrized 
domains as an ordinary multiple integral. 


t 3 

(a) f y(J) y 2 dy + x 2 dz , where I = [0, a) and 7(t) = | t 2 + 1 
y ' 1 t 2 - 1 


2 

tr - v 


(b) S-y(V) s * n 1 / 2 ^ a where U = [ 0 , a] x [ 0 , bj, and 7 ( ^ | UV A 

(c) f 7 (U)( x ' + x a) dx2 A dx 3y where U = | |v| < u < l}, 


e u \ 

»—v 


and 7(l|) = 6 

\vj 1 cost* 

\ sin v / 

(d) X2X4 dx\ A dx 3 A dx 4, where 





(iy — l) 2 > u 2 + v 2 , 0 < w < 1 f , and 7 [ v | = 

w 


u 


/ u + v \ 
u - v 
w + v 
\w — v / 


V 

and F = 

r 21 
x z 

xy 

xy 

—z 


X 

m J 


of the vector fields F = 


(b) For what vector field F is each of the following 1 -form fields in R 3 the 
work form field W pi 

(i) xydx-y 2 dz ; (ii) y dx + 2 dy - 3 x dz. 

(c) For what vector field F is each of the following 2 -form fields in K 3 the 
flux form field $ pi 

(i) 2 z 4 dx A dy + 3 ydy Adz - x 2 z dx A dz; (ii) £2X3 dx Adz - x\xz dy A dz. 


6 . 4.2 What is the work form field Wp(F*(i I)) of the vector field 



f 1 y 1 = 

z 


2 

x y 
x -y 
-z 


0 

, at a = | 1 | , evaluated on the vector u = 
2 


1 

-1 

1 
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*\ 

-X 

y = 

y 2 

V) 

x y_ 


evaluated on P° 


1 

0 

1 


0 

1 

0 


1 

at the point x = ( 2 | ? 

-1 


6.4.4 Evaluate the work of each the following vector fields F on the given 
1 -parallelograms: 


(a) F = 


x 

y 


on 




(b) F = 


x * 

sin xy 


on P°, 


(c) F = 

' y ' 

X 

Oil P°. 

( 1 \ 

2' 

3 

(d) F = 


Z 

: 

-1 



sin y 
cos(x + z) 




on P\ 


0 

1 

-l 


0 

1 

0 


6.4.5 What is the density form of the function / ( y 

z 


— xy + z 2 , evaluated 



V 


'2' 


‘O' 

I 2 I on the vectors 

0 


1 

, and 

1 

vJ 

i 


1 

— 


1 


6.4.6 Given the vector field F ^ y ) = 

r 


V 

x + z 
xz 


,-l 


what is 


, the function / | y 

z 


— XZ+ 


and the vectors Vi = 

‘0* 

1 

,V2 - 

'l' 

l 

ii 

■ > 

-r 

i 


1 


0 


i 

» m 


(1) the work form Wp(P£(\ j))? 

(2) the flux form $j?(P2(v i,v 2 ))? 

(3) the density form p f (P£(v u v 2 , v 3 ))? 


6.4.7 Evaluate the flux of each the following vector fields F on the given 
2- par al lelograms : 
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siny 

cos(x + z) 
e x 





6.4.8 Verify that det[F(x), Vi, . . . f v n -i] is an (n - 1)-form field, so that 
Definition 6.4.10 of the flux form on R n makes sense. 


Exercises for Section 6.5: 
Orientation 





Figure 6.5.7. 
Surfaces for Exercise 6.5.7: 
which are orientabie? 


6.5.1 (a) Let C C IR 2 be the circle of equation x 2 4- y 2 = 1. Find the unit 
vector field f describing the orientation “increasing polar angle.'* 

(b) Now do the same for the circle of equation (x - l) 2 + y 2 = 4. 

(c) Explain carefully why the phrase “increasing polar angle” does not de- 
scribe an orientation of the circle of equation (x — 2) 2 + y 2 — 1. 

6.5.2 Prove that if a linear transformation T is not one to one, then it is not 
orientation preserving or reversing. 

6.5.3 In Example 6.5.16, does dx\ Ady 2 define an orientation of 5? Is it the 
same as the orientation given by dx\ A dyi? 

6.5.4 Show that the ad hoc definitions of orientation-preserving parametriza- 
tions (Definitions 6.5.11 and 6.5.12) are special cases of Definition 6.5.15. 

6.5.5 Let z\ - Xj + iyi,z 2 — x 2 4- iy 2 be coordinates in C 2 . Consider the 
surface 5 in C 2 parametrized by 

7 : 2 ^ (e“*) ’ * = z + < 1, |y| < 1 

which we will orient by requiring that C be given the standard orientation, and 
that 7 be orientation preserving. What is 

J dx i A dyi + dyi A dx 2 + dx 2 A dy 2 ? 

6.5.6 Let z\ = x\ + iy\ y z 2 = x 2 + iy 2 be coordinates in C 2 . 

Compute the integral of dx\ A dyi -I- dy\ A dx 2 over the part of the locus of 
equation z 2 — z f where |zi| < 1. 

6.5.7 Which of the surfaces in Figure 6.5.7 are orientabie? 

6.5.8 (a) Let X c IR n be a manifold of the form X = / -1 (0) where / : iR n — * 
M is a C 1 function and [D/(x)J ^ 0 for all x € X. Let v t , . . . v n _! be elements 
of T x (X). Show that 


u/(vi,...v n _i) = det(v/(x),v 1 ,...v n _i) 
defines an orientation of X. 
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(b) What is the relation of this definition and the definition of the boundary 
orientation? 

(c) Let X C R n be an (n - m)-dimensional manifold of the form X = f *(0) 
where f : R n -» -R m is a C 1 function and [Df(x)] is onto for all x e X. Let 
Vi, . . . v„_i be elements of T x (X). Show that 

w(vi, . . • v n _ m ) = det^V/i(x), . . . , V/ m (x), Vi, . • • V n _ m ^ , 
defines an orientation of X. 


6.5.9 Consider the map R 2 — * R 3 given by spherical coordinates 

( cos <p cos 9 
cos <p sin 9 
simp 

The image of this mapping is the unit sphere, which we will orient by the 
outward- pointing normal. In what part of R 2 is this mapping orientation pre- 
serving? In what part is it orientation reversing? 



6.5.10 (a) Find a 2-form ip on the plane of equation £ + j/ + z = 0so that 

if the projection 
preserving. 

(b) Repeat, but this time find a 2-form a so that if the projection is oriented 
by a, it is orientation reversing. 



^ ^ is oriented by <p, the projection is orientation- 


6.5*11 Let S be the part of the surface of equation z = sinxy + 2 where 
x 2 + y 2 < 1 and x > 0, oriented by the upward-pointing normal. What is the 


flux of the vector field 


0 

0 


through 5? 


Yx + y\ 


6.5.12 Is the map 



cos <p cos 9 
cos (p sin $ 
siiup 


0 £ 0,<P < x 


an orientation preserving parametrization of the unit sphere oriented by the 
outward-pointing normal? the inward-pointing normal? 


6.5.13 What is the integral 



#3 dx i A dx 2 A dx A 


where S is the part of the three-dimensional manifold of equation £4 = £ 1 X 2^3 
where 0 < £i,£ 2, #3 ^ L oriented by dx\ A dx 2 A dx 3 . Hint: this surface is a 
graph, so it is easy to parametrize it. 
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6.5.14 Find the work of the vector field F ( * ) = ( *$ ) around the bound- 
ary of the rectangle with vertices (o)'(a)’(a)’(o)’ or ' ente< * 80 t * iat these 
vertices appear in that order. 


6.5.15 Find the work of the vector field 

~( x \ ( x2 \ 

F \ y = \ y 2 I over the arc of helix parametrized by t 

w v*v 





In Exercises 6.5.16 and 6.5.17, 
part of the problem is finding 
parametrizations of S that pre- 
serve orientation. 


with 0 < t < a, and oriented by increasing t. 


where r = 


- x \ r 

6.5.16 Find the flux of the vector field F I y 1 = r° I y 

y/x 2 + z 2 4- z 2 , and a is a number, through the surface S, where 5 is the sphere 
of radius R oriented by the outward-pointing normal. The answer should be 
some function of a and R. 


6.5.17 Find the flux of the vector field F 



y 

= [ -z | , through 5, where 
yz 

S is the part of the cone z = \J x 2 + y 2 where x, y > 0, x 2 + y 2 < R , and it 
is oriented by the upward pointing normal (i.e., the flux measures the amount 
flowing into the cone). 


6.5.18 What is the flux of the vector field 


Hint for Exercise 6.5.19, part 

(b) : Show that you cannot choose 
an orientation for Mi (2, 3) so that 
both <p\ and as defined in Ex- 
ercise 3.2.10, are both orientation 
preserving. 

Hint for Exercise 6.5.19, part 

(c) : Use the same method as in 
(b); this time you c an find an ori- 
entation of Mi (3, 3) such that all 
three of <p\, and v ?3 are orien- 
tation preserving. 



x 

-y 

xy 


through the surface z = \/x 2 + y 2 , x 2 + y 2 < 1, 


oriented by the outward normal? 


6.5.19 This exercise has Exercise 3.2.10 as a prerequisite. Let Mi(n,m) be 
the space of n x m matrices of rank 1. (a) Show that Mj( 2,2) is orientable. 
(This follows from Exercise 3.2.6 (a) and 6.5.8(a).) 

(*b) Show that Mi (2, 3) is not orientable. 

(*c) Show that Mi (3, 3) is orientable. 

6.5.20 Consider the surface S in C 3 parametrized by 



Exercises for Section 6.6: 
Boundary Orientation 


Exercises for Section 6.7: 
The Exterior Derivative 
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which we will orient by requiring that C be given the standard orientation, and 
that 7 be orientation-preserving. What is 

I dx\ A dy\ + dx 2 A dy 2 + dxz A dyz ? 


6.6.1 Consider the curve S of equation x 2 + y 2 = 1, oriented by the tangent 


vector 


at the 


5 point (o)- 


(a) Show that the subset X where x > 0 is a piece-with-boundary of S. 


What is its oriented boundary? 

(b) Show that the subset Y where |xj < 1/2 is a piece-with-boundary of S. 
What is its oriented boundary? 

(c) Is the subset Z where x > 0 a piece-with-boundary? If so, what is its 
boundary? 


6.6.2 Consider the region X = PnB CM 3 , where P is the plane of equation 
x + y + 2 = 0, and B is the ball x 2 + y 2 + z 2 < 1 . We will orient P by the normal 

/ 1 \ 

III, and the sphere x 2 + y 2 + z 2 = 1 by the outward-pointing normal. 


(a) Which of the forms dxAdy, dx Adz, dyAdz define the given orientation 
of P? 

(b) Show that X is a piece-with-boundary of P, and that the mapping 

/ cost suit \ 

( 72-~76 \ 


cost sint 
o sint 

V W6 / 


0 < t < 2n 


is a parametrization of dX. 

(c) Is the parametrization compatible with the boundary orientation of dX. 

(d) Do any of the 1-forms dx, dy , dz define its orientation at every point? 

(e) Do any of the 1-forms xdy - ydx, xdz - zdx , ydz - zdy define its 
orientation at every point? 


6.7.1 What is the exterior derivative of 

(a) sin (xyz) dx in 3R 3 ; (b) xix$ dx 2 A dx^ in R 4 ; 

(c) x ? ^1 A • • • A dxi A • • • A dx n in R n . 


6.7.2 (a) Is there a function / on R 3 such that 

( 1 ) df = cos(x + yz) dx + y cos(x -I- yz) dy + z cos(x + yz) dz ? 

(2) df = cos(x + yz) dx + z cos(x + yz) dy + y cos(x + yz) dz ? 
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(b) Find the function when it exists. 

6.7.3 Find all the 1-forms u - p(y, z) dx + q(x , z) dy such that 

dw = x dy A dz + y dx A dz. 

6.7.4 (a) Let v? = xyz dy. Compute from the definition the number 

/ ' 

Pi ■ - 


d<p 


Ui) 


) 


(b) What is dpi Use your result to check the computation in (a). 

6.7.5 (a) Let p = £1X3 dx 2 A dx 4. Compute from the definition the number 

dp(P^(e 2,e 3 ,e 4 )) . 

(b) What is dp ? Use your result to check the computation in (a). 

6.7.6 (a) Let p — x \ dx 3. Compute from the definition the number 

(b) What is dp? Use your result to check the computation in (b). 

6.7.7 (a) There is an exponent m such that 


2\m 


V • (x 2 + y 2 4- z 2 ) 


X 

y 

z 


= 0; find it. 


Xi 


Lx„J 


(b*) More generally, there is an exponent m (depending on n) such that the 
(n - l)-form $ r m r has exterior derivative 0, when r is the vector field 

and r = |r|. Can you find it? (Start with n = 1,2.) 

6.7.8 Show that each face of a (k 4- 2)-parallelogram is a (A: + 1) -dimensional 
parallelogram, and that each edge of the ( k 4- l)-parallelogram is also the edge 
of another (A: -I- l)-parallelogram, but with opposite orientation. 

Exercises for Section 6.8: 

The Exterior Derivative in 1R 6.8.1 Compute the gradients of the following functions: 


(a) 

'(;)« 

(b) 


;)-«■ 

(c) 


;)=x 2 + y 2 

(d) 

N 

1 

N 

H 

II 

H 

(e) 

'(1 

( ) = sin(a: -I- y) 

(f) 

/(; 

n =log(i 2 + j( 2 ) 



582 Chapter 6. Forms and Vector Calculus 


= xyz 



log \x + y + z\ 


0 ) 



xyz 

—n ~*V 

x 2 + 2T + z 


1 


6.8.2 (a) For what vector field F is the 1-form on M 3 

x 2 dx + y 2 z dy -I- xy dz 

the work form field W 

(b) Compute the exterior derivative of x 2 dx + y 2 z dy-Vxydz using Theorem 

6.7.3 (computing the exterior derivative of a fc-form), and show that it is the 
same as 


6.8.3 (a) For what vector field F is the 2-form on R 3 

(xy) dx A dy -f (x) dy Adz + (xy) dx A dz the flux form field $ p? 

(b) Compute the exterior derivative of (xy) dxAdy-\-(x) dyAdz + (xy) dxAdz 
using Theorem 6.7.3, and show that it is the same as the density form field of 
div F. 


F, 

F 2 


6.8.4 (a) Show that if F = 

is the gradient of a C 2 function, then D 2 F 1 = DiF 2 . 
(b) Show that this is not true if / is only of class C 1 


= grad / is a vector field in the plane which 


6.8.5 Which of the vector fields of Exercise 1.1.5 are gradients of functions? 

6.8.6 Prove the equations 

curl (grad /) = 0 and div(curl F) = 0 

for any function / and any vector field F (at least of class C 2 ) using the formulas 
of Theorem 6.8.3. 


6.8.7 (a) What is dW 

'o’ 

0 

? What is dW 

o' 

0 

(P 

t> 

O' 

0 


. X, 


. X. 


.0. 


(®i» © 3 ))? 


(b) Compute dW 


o' 


o' 

0 


0 

. X. 


. 0 . 


(© 1 , © 3 )) directly from the definition. 


6.8.8 (a) Find a book on electromagnetism (or a tee-shirt) and write Max- 

well’s laws. 


Let E and B be two vector fields on R 4 , parametrized by x, y y z , t. 

(b) Compute d(Wg Acdt + $g). 

(c) Compute d(W s A cdt — 

(d) Show that two of Maxwell’s equations can be condensed into 


d(Wg A cdt + &g) — 0. 
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Exercises for Section 6.9: 
Stokes’s Theorem in R n 


(e) How can you write the other two Maxwell’s equations using forms? 
6.8.9 (a) What is the exterior derivative of W r - ? (h\ Df d>. _ ? 


X 

? (b) Of $ 

r 

X 

y 

, z . 


V 
. z . 


6.8.10 Compute the divergence and curl of the vector fields 



* x 2 i/ ‘ 


sinx* 

(a) 

-2yz 

and (b) 

cosyz 


x 3 y 2 


xyz 


f x \ 

'x 2 ' 

M = 

y 2 

V*/ 



(b) Use part (a) to compute 


(0 


(c) Compute it again, directly from the definition. 


6.9.1 Let V be a compact piece- with-boundary of IR 3 . Show that the volume 
of V is given by 


f \(zdxAdy 
Jau *> 


+ y dz A dx + x dy A dz ) . 


6.9.2 (a) Find the unique polynomial p such that p(l) = 1 and such that if 

w = x dy A dz - 2 zp(y) dx A dy + yp(y) dz A dx , 
then dh) - dx A dy A dz, 

(b) For this polynomial p, find the integral J s u, where S is that part of the 

sphere x 2 + y 2 + z 2 = 1 where z > y/2/2, oriented by the outward-pointing 
normal. 

6.9.3 What is the integral 

J^x dy A dz + y dz A dx + z dx A dy 

over the part of the cone of equation z = a - y/x 2 + y 2 where z > 0, oriented by 
the upwards-pointing normal. (The volume of a cone is | • height area of base.) 

6.9.4 Compute the integral of x x dx 2 A dx 3 A dx A over the part of the three- 
dimensional manifold of equation 

xi + x 2 + xs + x 4 = a, xu x 2 , x 3 , x 4 > 0, 
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Exercises for Section 6.10: 

The Integral Theorems 
of Vector Calculus 


oriented so that the projection to the ^-coordinate 3-space is orienta- 

tion preserving. 

6.9.5 (a) Compute the exterior derivative of the 2-form 

_ x dy A dz + y dz A dx + z dx A dy 
^ ( x 2 4- y 2 4- 2 2 ) 3 / 2 

(b) Compute the integral of <p over the unit sphere x 2 4- y 2 4- z 2 ~ 1, oriented 
by the outward-pointing normal. 

(c) Compute the integral of over the boundary of the cube of side 4, 
centered at the origin, and oriented by the outward-pointing normal. 

(d) Can <p be written for some 1-form 0 on iR 3 — {0}. 

6.9.6 What is the integral of 

xdy Adz + ydz A dx + zdx A dy 

over the part S of the ellipsoid 

x 2 y 2 z 2 
^2 + &2 + ^2 = 

where x, y, z >0. oriented by the outward- pointing normal? (You may use 
Stokes’s theorem, or parametrize the surface.) 

6.9.7 (a) Parametrize the surface in 4-space given by the equations 

x\ + x\ = a 2 , xj + x\ = b 2 . 

(b) Integrate the 2-form X1X2 dx 2 A dx 3 over this surface. 

(c) Compute d(xiX2 dx 2 A dx 3). 

(d) Represent the surface as the boundary of a three-dimensional manifold 
in R 4 , and verify that Stokes’s theorem is true in this case. 

6.9.8 Use Stokes’s theorem to prove the statement in the caption of Figure 
6.7.1, in the special case where the surface 5 is a parallelogram: i.e., prove that 
the integral of the “element of solid angle” over a parallelogram S is the 
same as its integral over the corresponding P. 


6.10.1 Suppose U C R 3 is open, F is a vector field on U , and a is a point of 

U. Let S r ( a) be the sphere of radius r centered at a, oriented by the outward 
pointing normal. Compute 


lim -r 

j — >0 T * 
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Hint for Exercise 6.10.2: use 
cylindrical coordinates. 


In Exercise 6.9.6, S is a box 
without a top. 

This is a "shaggy dog” exercise, 
with lots of irrelevant detail! 


6.10.2 (a) Let X be a bounded region in the (x, 2)-plane where x > 0. and 

call Z a the part of K 3 swept out by rotating X around the 2-axis by an angle 
a. Find a formula for the volume of Z ai in terms of an integral over X. 


(b) Let X be the circle of radius 1 in the (z, 2)-plane, centered at the point 
x = 2, 2 = 0. What is the volume of the torus obtained by rotating it around 
the 2-axis by a full circle? 


(c) What is the flux of the vector field 


x 

y 

l 2 j 


through the part of the boundary 


of this torus where y > 0, oriented by the normal pointing out of the torus? 

6.10.3 Let F be the vector field F = V What is the work of F 

along the parametrized curve 


7 (*) = 


t count 
t 
t 


0 < t < 1, oriented so that 7 is orientation preserving? 


6.10.4 What is the integral of 



around the boundary of the 11-sided regular polygon inscribed in the unit circle, 
with a vertex at ( q ) ’ or i ente< l as the boundary of the polygon? 


6.10.5 Let 5 be the surface of equation 2 = 9 — y 2 , oriented by the upward- 
pointing normal. 

(a) Sketch the piece X C S where x > 0, 2 > 0 and y > x, indicating 
carefully the boundary orientation. 


(b) Give a parametrization of X , being careful about the domain of the 
parametrizing map, and whether it is orientation preserving. 


(c) Find the work of the vector field F 
of X. 


/x\ 

’o' 

( y 1 = 

XZ 

W 

. 0 


around the boundary 


6.10.6 Let C be a closed curve in the plane. Show that the two vector fields 

r n r * 


y 

A 


and 


0 

x 


do opposite work around C. 


6.10.7 Suppose U C K 3 is open, F is a vector field on £/, a is a point of (7, 
and v ^ 0 is a vector in IR 3 . Let Ur be the disk of radius R in the plane of 
equation (x - a) • v = 0, centered at a, oriented by the normal vector field v, 
and let OUr be its boundary, with the boundary orientation. 
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Hint for Exercise 6.10.9: the 
first step is to find a closed curve 
of which C is a piece. 


Hint: think of integrating x dy 
around the triangle. 


Exercises for Section 6.11: 
Potentials 


Compute 


iim ^ / 
R — o F 2 JdU R 


Wp. 


6.10.8 Let U C R 3 be a subset bounded by a surface S, which we will give 
the boundary orientation. What relation is there between the volume of U and 
the flux f s * [x 
v 

z 


6.10.9 Compute the integral } c Wp, where F (* ) = 
the upper half-circle x 2 + y 2 = 1, y > 0, oriented clockwise. 

r- 2 

6.10.10 Find the flux of the vector field 


xy 

£22J L + x 
y+i ^ x 


, and C is 


y 


through the surface of the 


unit sphere, oriented by the outward-pointing normal. 

6. 10. 1 1 Use Green’s theorem to calculate the area of the triangle with vertices 


V 


<22 


O3 

. b ! 

7 

62 

» ■■ 

> 

b 3 


6.10.12 What is the work of the vector field 
y 2 = l, z = 3 oriented by the tangent vector 


-33/ 

3x 

1 


around the circle x 2 + 


0' 


V 

-1 

at 

0 

0 


3 


6.10.13 What is the flux of the vector field 


F| 2/ | = 

z 


x + yz 
y + xz 
z-\-xy 


through the boundary of the region in the first octant 


x,y,z > 0 where z < 4 and x 2 + y 2 < 4, oriented by the outward-pointing 
normal? 

6.11.1 For the vector field of Example 6.11.3, show (Equation 6.11.14) that 

F = V ^arctan . 
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6.11.2 A charge of c coulombs per meter on a vertical wire x = a,y — b 
creates an electric potential 


V 



= £ log ((# - a ) 2 + (y - b) 2 ). 


Several such wires produce a potential which is the sum of the potentials due 
to the individual wires. 

(a) What is the electric field due to a single wire going through the point 
with charge per length c — 1 coul/m, where coul is the unit for charge. 

(b) Sketch the potential due to two wires, both charged with 1 coul/m, one 
going through the point ( q ) ’ anc * t ^ ie ot ^ er through ( q ) ' 

(c) Do the same if the first wire is charged with 1 coul/m and the other with 
-1 coul/m. 


6.11.3 

{ 0 }? 


(a) Is the vector field 


^T7 

j 


the gradient of a function on M 2 - 


(b) Is the vector field 



on R 3 the curl of another vector field? 


6.11.4 Find a 1-form p such that dp = ydx Adz — xdy A dz. 

6.11.5 Let F be the vector field on R 3 


( x \ 

Fi(x, y)' 

S' 1 = 

F 2 (x,y) 

W 

0 


Suppose AzFi = D 1 F 2 . Show that there exists a function / : R 3 -+ R such 
that F = V/. 


*6.11.6 (a) Show that a 1-form p on R 2 — 0 cam be written df exactly when 

d<p = 0 and f $l p = 0, where S 1 is the unit circle, oriented counterclockwise. 

(b) Show that a 1-form p on R 2 - j ( [j ) , ( | can be written c(f exactly 
when dp = 0 and both p = 0, p — 0 where S\ is the circle of radius 1/2 

centered at the origin, and S 2 is the circle of radius 1/2 centered at both 
oriented counterclockwise. 




Appendix A: Some Harder Proofs 


A.O Introduction 

When this book was first used in manuscript form as a textbook for the 
standard first course in multivariate calculus at Cornell University, all proofs 
were included in the main text and some students became anxious, feeling, 
despite assurances to the contrary, that because a proof was in the text, they 
were expected to understand it. We have thus moved to this appendix certain 
more difficult proofs. They are intended for students using this book for a 
class in analysis, and for the occasional student in a beginning course who has 
mastered the statement of the theorem and wishes to delve further. 

In addition to proofs of theorems stated in the main text, the appendix 
includes material not covered in the main text, in particular rules for arithmetic 
involving o and O (Appendix A.8), Taylor’s theorem with remainder (Appendix 
A.9), two theorems concerning compact sets (Appendix A. 17), and a discussion 
of the pullback (Appendix A.21). 

A.l Proof of the Chain Rule 

Theorem 1.8.2 (Chain rule). Let U C R n , V C R m be open sets, let 
g : U -* V and f : V -* R p be mappings, and let a be a point of U. If g is 
differentiable at a and f is differentiable at g(a), then the composition fog 
is differentiable at a, and its derivative is given by 

|D(f o g)(a)] = [Df(g(a))| o (Dg(a)]. 1.8.12 

Proof. To prove the chain rule, you must set about it the right way; this 
is already the case in one-variable calculus. The right approach ^at least, one 
that works), is to define two “remainder” functions, r(h) and s(k). The func- 
tion r(h) gives the difference between the increment to the function g and its 
linear approximation at a. The function s(k) gives the difference between the 
increment to f and its linear approximation at g(a): 

g(a -1- h) - g(a) - [Dg(a)]h = r(h) A\.\ 

increment to function linear approx. 

f(g(a) + k) - f (g(a)) - JDf(g(a)))k = s(k). Al.2 

increment to f linear approx. 


... a beginner will do well 
to accept plausible results without 
taxing his mind with subtle proofs , 
so that he can concentrate on as- 
similating new notions, which are 
not “i evident "' . — Jean Dieudonne, 
Calcul Infinitesimal 
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The hypotheses that g is differentiable at a and that f is differentiable at g(a) 
say exactly that 

lim = 0 and lim = 0. >11.3 

h— »o |h| k— *0 |kj 

Now we rewrite Equations A 1.1 and A 1.2 in a form that will be more convenient: 

g(a 4- h) = g(a) + [Dg(a)]h 4- r(h) AlA 

f (g(a) + k) = f (g(a)) + [Df(g(a))]k + s(k), A1.5 


and then write: 


In the first line, we are just 
evaluating f at g(a + h), plugging 
in the value for g(a + h) given 
by the right-hand side of Equation 
A1.4. We then see that [D^(a)]h+ 
r(h) plays the role of k in the 
left side of Equation A1.5. In the 
second line we plug this value for 
k into the right side of Equation 
A1.5. 

To go from the second to the 
third line we use the linearity of 
(Df(g(a))]: 

[Df(g(a)))((Dg(a)]h + r(h)) 

= (Df(g(a))](Dg(a)]fi 
+ |Df(g(a))]r(h). 


from Equation A 1.4 

/ ■ ' — A — — — \ 

f(g(a + h)) = f(g(a) 4- [Dg(a)]h 4- r(h) ) AlS 

s * ^ 

k, left-hand aide Rq. A 1.5 

= f(g(a)) + [Df(g(a))] ([Dg(a)]h + r(ii)) +s ([Dg(a)]h -h r(h)) 

k k 

= f(g(a)) + (Df(g(a))]([Dg(a)]h)+(Df(g(a))](r(h)) + s([Dg(a)]h + r(h)) . 

V .. - I. v ^ 

remainder 

We can subtract f(g(a)) from both sides of Equation A 1.6, to get 

linear approx, linear approx, 
to f at g(a) to g at a 

f(g(a -f h)) - f(g(a)) = [Df(g(a))j ([Dg(a)] h) 4 - remainder. A1.7 

s -— — ■ ' S— -I— V' — '— ■■ 

increment to composition composition of linear approximations 

The “composition of linear approximations” is the linear approximation of the 
increment to f at g(o) as evaluated on the linear approximation of the increment 
to g at h. 

What we want to prove is that the linear approximation above is in fact the 
derivative of f o g as evaluated on the increment h. To do this we need to prove 
that the limit of the remainder divided by |h| is 0 as h — » 0: 


lim [D f (g( a ))l( r (h)) + s([Dg(a))h + r(h)) _ 
fi-o |h| 


.41.8 


Let us look at the two terms in this limit separately. The first is straightfor- 
ward. Since (Proposition 1.4.11) 


|[Df(g(a))]r(h)| < |[Df(g(a))]| |r(h)|. 


.41.9 
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we have 


lim 
h— o 


|[Df(g(a))lr( h)j | ,. m Jr®l = „ Al w 

|h| “ 11 181 " h-o |h| 


✓ 


= Oby Eq. A 1.3 

The second term is harder. We want to show that 

s([Dg(a)]h + r(h)) _ 


lim 

o |h| 


= 0. 


i41.ll 


First note that there exists <5 > 0 such that |r(h)| < |h| when |h| < 6 (by 
Equation Al.3). 1 

Thus, when |h| < <5, we have 

|[Dg(a)]h + r(h)i < |[Dg(a))h| 4- |h| = (|[Dg(a)]| + l)|h|. Ah 12 
<|h| 

Now Equation Al.3 also tells us that for any e > 0, there exists 0 < 8 f < 6 
such that when |k| < S\ then |s(k)| < e|k|. If you don’t see this right away, 
consider that for |k| sufficiently small, 


< c; i-e., |s(k)| < e|£|. 

|k| 


41.13 


Otherwise the limit as k — ► 0 would not be 0. We specify “|k| sufficiently small” 
by |k| < S'. 

Now, when 

W< |[Dg(a)])l + l ; ie " (l[D8(»)]l + l)N<<5', M.1A 
then Equation A 1.12 gives 

l[Dg(a)]ii + r(h)| < S', 

so we can substitute the expression |[Dg(a)]h + r(h)| for |k| in the equation 
|s(k)| < c|k|, which is true when |k| < S'. This gives 


|s(!Dg(a)]h + r(h))| < e|[Dg(a)]h + r(h)| < __ * (|[Dg(a)]| + l) |h|. 

s — — v, ... " v ._ „ " ' 

= |s(k)| =e|k| 

Al. 15 


Eq. A1.12 


Dividing by |h| gives 


|.([P» ( .)jh<.r(fi))| s <(||DiM|| ^ ,) 
i h l 


Al . 16 


^In fact^by choosing a smaller <5, we could make |r(fi)| as small as we like, getting 
|r(n)| < e|h| for any c > 0, but this will not be necessary; taking t = 1 is good enough 
(see Theorem 1.5.10). 
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and since this is true for every t > 0, we have proved that the limit in Equation 
Al.ll is 0: 


|im s([ Dg(a)]h + r(h)) = 0 D 
h— »o |h| 


(Al.ll) 


A. 2 Proof of Kantorovitch’s Theorem 


Theorem 2.7.11 (Kantorovitch’s theorem). Let ao be a point in M n , U 
an open neighborhood of ao in IR n and f : U — ♦ R n a differentiable mapping , 
with its derivative (Df(ao)] invertible. Define 

ho - -(Df(ao)] _1 f(ao) , aj = ao + ho , U 0 = |x | |x - ai| < |ho| j . 

A2.1 

If the derivative (Df(x)] satisfies the Lipschitz condition 

|[Df(tii)] - [Df(u 2 )]| < M|u i - u 2 | for all points u 1? u 2 € C/ 0 , A2.2 

and if the inequality 

|f(ao)||[Df(ao)]-fM<i A2.3 

is satisfied, the equation f(x) =0 has a unique solution in Uq, and Newton’s 
method with initial guess ao converges to it. 


Facts (1), (2) and (3) guaran- 
tee that the hypotheses about ao 
of our theorem are also true of 
a*. We need (1) in order to de- 
fine hi and a a and U\. State- 
ment (2) guarantees that U\ c t/ 0 , 
hence [Df(x)] satisfies the same 
Lipschitz condition on fA as on 
Uo. Statement (3) is needed to 
show that Inequality A2.3 is sat- 
isfied at ai. (Remember that the 
ratio M has not changed.) 


Proof. The proof is fairly involved, so we will first outline our approach. We 
will show the following four facts: 

(1) [Df(ai)j is invertible, allowing us to define hi = — [Df (ai )]"" 1 f (aj ); 

(2) |h,|<l^!; 

(3) |f(.,)| |[Df(a,rf < |f(ao)| |[Df(a 0 )]- , | 2 ; 

(4) |f(aj)| < y|h 0 | 2 . A2A 

If (1)> (2), (3) are true we can define sequences h t ,ai,£/ t : 

hi = -[Df(ai)J ^(aj), a t = ai_! -f hi_!, and C/i = |x J |x - a i+1 | < |h,| j , 

and at each stage all the hypotheses of Theorem 2.7.11 are true. 

Statement (2), together with Proposition 1.5.30, also proves that the ai con- 
verge; let us call the limit a. Statement (4) will then say that a satisfies f (a) = 0. 
Indeed, by (2), 
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In Example A2.2 we have no 
guarantee that the difference be- 
tween the increment to / and its 
approximation by f'{x)h behaves 
like h 2 . 


so by part (4). 

|/(a,)|<f|h,-i| 2 <fiho| 2 . 

and in the limit as i — > oo, we have |f(a)| = 0. 

First we need to prove Proposition A2.1 and Lemma A2.3. 

Proposition A2.1. IfUc R n is a ball and f : U -* M m is a differentiable 
mapping whose derivative satisfies the Lipschitz condition 

|[Df(x)] - [Df(y)]| < M|x - y|, A2.7 

then 

\ f (x + h) - f (x) I - [Df(x))h |<y|h| 2 . A2.S 

increment to f linear approx. 

of increment to f 


Before embarking on the proof, let us see why this statement is reasonable. 
The term (Df(x)Jh is the linear approximation of the increment to the function 
in terms of the increment h to the variable. You would expect the error term 
to be of second degree, i.e., some multiple of |h| 2 , which thus gets very small 
as h — ♦ 0. That is what Proposition A2.1 says, and it identifies the Lipschitz 
ratio M as the main ingredient of the coefficient of |h| . 

The coefficient M/2 on the right is the smallest coefficient that will work 
for all functions f : U — ► P. m , although it is possible to find functions where 
an inequality with a smaller coefficient is satisfied. Equality is achieved for the 
function f(x) = x 2 : we have 

[D/(x)l = f'(x) - 2x, so |[D/(x)] - [D/(y)]| = 2|x - y\, >12.9 

and the best Lipschitz ratio is M = 2: 

9 M 

I f(x + h) - f(x) - 2xh\ = |(x -I- h) 2 - x 2 - 2xh\ - h 2 - - h 2 = ~x~h 2 . A2.10 

If the derivative of / is not Lipschitz (as in Example A2.2) then it may be 
the case that there exists no C such that 

|f (x + h) - f (x) - [Df(x)Jh| < C|h| 2 . A2. 1 1 


Example A2.2 (A derivative that is not Lipschitz). Let f(x) = x 4/3 , so 
lDf(x)] = f'(x) = |x 1/3 . In particular /'( 0) = 0, so 

|/(0 + h) - f(0) - /'(0)/i| = A2.12 

But /i 4 / 3 is not < C\h\ 2 for any C , since h 4 ^ 3 /h 2 = 1/fi 2 ^ 3 — * * oo as h — » 0. 
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Proof of Proposition A2.1. Consider the function g(t) = f(x + th). Each 
coordinate of g is a differentiable function of the single variable t, so the fun- 
damental theorem of calculus says 

f(x + h) - f (x) = g(l) - g(0) = f g'(t) dt. 

Jo 

Using the chain rule, we see that 

g'(t) = [Df (x + th)]h, 


A2.13 

A2.14 


which we will write as 

g'(t) = [Df(x)]h + ([Df (x + th)]h - [Df(x)]h) . 42.15 

This leads to 


f(x + h) - f(x) = / (Df(x)jhdt + / ((Df(x + th)]h - (Df(x)Jh) dt. 42.16 


The first term is the integral from 0 to 1 of a constant, so it is simply that 
constant, so we can rewrite Equation A2.16 as 


To go from the first to the sec- 
ond line of Equation A2. 1 7 we use 
Equation A2.7. 


|f(x + h) - f(x) - [Df(x))h| = | jf ([Df(x + th)]h - [Df(x)]h) dt\ 

< f M|x + th - x||h| dt 42.17 

Jo 

r 1 _ _ M - 

< / A/t|h||h| df = — |h| 2 . □ 

Jo 


Proving Lemma A2.3 is the 
hardest part of proving Theorem 
2.7.11. At the level of this book, 
we don’t know much about in- 
verses of matrices, so we have to 
use “bare hands” techniques in- 
volving geometric series of matri- 
ces. 


Lemma A2.3. The matrix [Df(ai)] is invertible , and 

|[Df(a 1 )]- 1 |<2|[Df(a 0 r 1 |. *2.18 

Proof. We have required (Equation A2.2) that the derivative matrix not vary 
too fast, so it is reasonable to hope that 

[Df (ao)) _ 1 [Df (a, )] 


is not too far from the identity. Indeed, set 

A = / - ([Df(ao)]- l [Df(a,)]) = [Df(ao)]-‘[Df(ao)] -(|Df(ao)]- l [Df(a,)]) 

S — 1 V 1 

l 

= (Df(ao)j- 1 ((Df(ao)l - Pf(a,)]). 42.19 

By Equation A2.2 we know that ||Df(ao)] - [Dffax)]) < Af|ao - aj, and by 
definition we know |ho| = |ai - ao|. So 

Ml < |[Df(a 0 )]' 1 | |fio|Af. 


42.20 
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By definition, 

ho = -!Df(ao)]-‘f(ao), so |h„| < |[Df(a 0 ))- 1 ||f(a 0 )| 
(Proposition 1.4.11, once more). This gives us 

\A\ < |(Df(ao)] _1 | |[Df(ao)] -1 | |f(ao)|M • 

left-hand side of Inequality A2.3 


A2.21 

A2.22 


Now Inequality A2.3 guarantees 

|.4| < i. .42.23 

which we can use to show that [Df(ai)J is invertible, as follows. We know from 
Proposition 1.5.31 that if |A| < 1, then the geometric series 

/ + A 4- A 2 + A 3 f ... = D A2.24 


Note that in Equation A2.27 we 
use the number 1, not the identity 
matrix: 1 + |A| + \A\ 2 . . . , not 

I + A + A 2 This is crucial 

because since |A| < 1/2, we have 

Ml + Ml 2 + Ml 3 + • • ■ < l. 

When we first wrote this proof, 
adapting a proof using the norm 
of a matrix rather than the length, 
we factored before using the trian- 
gle inequality, and ended up with 

/+ A+A 2 This was disastrous, 

because |/| = y/n, not 1. The dis- 
covery that this could be fixed by 
factoring after using the triangle 
inequality was most welcome. 


converges, and that B(I — A) = /; i.e., that B and (/ — A) are inverses of each 
other. This tells us that iDf(ai)] is invertible: from Equation A2.19 we know 
that I - A — (Df(ao)i _1 [Df(ai)]; and if the product of two square matrices is 
invertible, then they both are invertible. (See Exercise 2.5.14.) 

In fact, (by Proposition 1.2.15: ( AB)~ l ~ B~ 1 A~ 1 ) we have 

B = (/ - Ay 1 = [Df (a, ))" ■ 1 [Df (ao)], so .42.25 

[Df(a,)] _1 = B|Df(ao)]-‘ 

— (/ + A -h A ^ + • • • )[Df(ao)j 1 .42.26 

“ lOf(ao)) -1 + i4(Df(ao)] _I H , 

hence (by the triangle inequality and Proposition 1.4.11) 

|[Df(a,)]-‘| < |(Df(ao)) _1 | + M||(Df(a 0 )|- , | + • •• 

= |(Df(ao)J -1 |(l + M| + M| 2 H ) 

< |[Df(ao)|-‘| (1 + 1/2 + 1/4 + ■• •) =2|[Df(ao))-'|. 2 27 

V V ■■ - ■■ ^ 

since |A|<l/2, Eq. A2.23 

□ Lemma A2.3 


So far we have proved (1). This enables us to define the next step of Newton’s 
method: 

hi = [Df (aj ))~ 1 f(ai), a 2 = ai+ho, and f/i = jx | |x — a 2 | < |hi|| . 
Now we will prove (4), which we will call Lemma A2.4: 

Lemma A2.4. We have the inequality 

|f(a.)l < ylhol 2 . 


A2.29 
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Proposition A2.1 gives 

- M - „ 

|f(a,) - f(ao) - [Df(ao)]ho| < y|ho| 2 . 42.30 

Proof. This is a straightforward application of Proposition A2.1; but a miracle 
happens during the computation. Remembering that ho = -[Df(ao))~ 1 f(ao), 
we see that the third term in the sum on the left, 

-[Df(ao))ho = [Df(ao))[Df(ao)] -1 f(ao) = f(ao), A2.31 

cancels with the second term (that is the miracle). So we get 

|f(ai)| < -^|ho| 2 as required. □ Lemma A2. 4 A2.32 


The terms that cancel are ex- 
actly the value of the linearization 
to f at ao, evaluated at ai . 

The first inequality in Equa- 
tion A2.33 uses the definition of 
hi (Equation A2.28) and Proposi- 
tion 1.4.11. The second inequality 
uses Lemmas A2.3 and A2.4. 

Note that the middle term of 
Equation A2.33 has ai, while the 
right-hand term has ao- 


Figure A2.1 explains the miracle: why the cancellation occurred. 

Proof of Theorem 2.7.11 (Kantorovitch theorem) continued. Now we 
just string together ^he inequalities. We have proved (1) and (4). To prove 
statement (2), i.e., |hi| < |ho|/2, we consider 

fill < |f(a,)||(Df(a,)]- l | < ^^2|[Df(ao)]- 1 |. 42.33 

Now cancel the 2’s and write |ho| 2 as two factors: 

|h 1 |<|ho|M|[Df(a 0 ))- 1 ||h 0 |. A2.34 

Next, replace one of the |ho|, using the definition ho = -(Df(a 0 )]“ 1 f(ao), 
to get 


>|hol 

Ifill < Ifiol f .A/IIDftao))- 1 ! |f(ao)| |[Df(a 0 ))- 1 j ) < &1. >12.35 

<1/2 by Inequality A2.3 

Now to prove part (3), i.e.: 

|f(a 1 )||[Df(a 1 )]-‘| 2 < |f(ao)| |(Df(ao)] _1 | 2 . 42.36 

Using Lemma A2.4 to get a bound for |f(aj)|, and Equation A2.18 to get a 
bound for |[Df(ai)]“ 1 |, we write 

|f(a,)| |(Df(a,)J _1 | 2 < —^^-(4|[Df(ao)] _I | 2 ) 


>[fiol a 

<2|fDf(a 0 )]- 1 | 2 M(|[Df(ao)J- 1 ||f(a 0 )|) 2 

< |(Df(ao)l-‘| 2 |f(ao)|2|f(ao)||(Df(ao)l- l | 2 A/ 

^ — V * 

at most 1/2 by A2.3 

< |[Df(ao)i“ 1 | 2 |f(a 0 )|. □ Theorem 2.7.11 
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A. 3 Proof of Lemma 2.8.4 (Superconvergence) 

Here we prove Lemma 2.8.4, used in proving that Newton s method supercon- 
verges. Recall that 

c= r^l [Df(ao)rl| Y- (28 - 3) 

Lemma 2.8.4. If the conditions of Theorem 2.8. 3 are satisfied, then for all i, 

|h.+il < cliLl 2 . 43.1 

Proof. Look back at Lemma A2.4 (rewritten for a*): 

|f(*)l < ylhi-.P- 43.2 

The definition 

hi = ~[Df(a,)) _1 f(ai) A3.3 

gives 

|h,| < |[Df(ai)] -1 | |f(a,)| < y |[Df(a,)]- 1 ||h.-i| 2 . A3.4 

This is an equation almost of the form [hjf < c|hj_i[ 2 : 

|fiil<yl|Df(a i )]- 1 ||hi- 1 | 2 . A3.5 

s V - ^ 

Ci 

The difference is that c* is not a constant but depends on at. So the h* will 
superconverge if we can find a bound on |[Df(ai)]|~ 1 valid for all i. (The 
term M/2 is not a problem because it is a constant.) We cannot find such 
a bound if the derivative [Df(a)] is not invertible at the limit point a. (We 
saw this in one dimension in Example 2.8.1, where /'( 1) = 0.) In such a case 
l(DfM* 1 l — > > oo as a* —» a. But Lemma A3.1 says that if the product of the 
Kantorovitch inequality is strictly less than 1/2, we have such a bound. 

Lemma A3.1 (A bound on |[Df (a*)])~ 1 ). If 

|f(ao||[Df(ao)]" 1 | 2 M = k y where k < 1/2, A3.6 

then all (Df(at)] -1 exist and satisfy 

|[Df(a i )]-‘| < |[Df(a 0 )]- , |^. A3.7 

Proof of Lemma A3.1. Note that the ai in Lemma A2.3 is replaced here 
by an- Note also that Equation A2.35 now reads |hi| < fc|fio| (and therefore 
|h n | < fc|h n _i|), so that 


If we have such a bound, sooner 
or later superconvergence will oc- 
cur. 
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The proof of this lemma is a 
rerun of proof of Lemma A2.3; you 
may find it helpful to refer to that 
proof, as we are more concise here. 


K -ao| = 


n— 1 

i=0 


triangle 

inequality 


n— 1 

£ |h,| < |hol(l k - ) ■ ■ ■ + k n -') < |hol YZk' 

t=0 

A3.8 


The A n in Equation A3. 9 
(where we have a„) corresponds to 
the A in Lemma A2.3 (where we 
had ai). 

The second inequality of Equa- 
tion A3. 10 uses Equation A3. 8. 
The third uses the inequality 

||hol<|[Df(a„)]- 1 ||f(a <J )|; 
see Equation A2.21. 


Next write 

A n =I- [Df(ao)) ' '[Dffan)] = [Df(ao)]- 1 ([Df(ao)] - [Df(a„)]), >13.9 

> v ■ — 

<A/|ao-a n j by 
Lipschitz cond. 

so that 

|A„| < |(Df(ao)]-‘|A/|ao - a*l < lIDf^r'lM^ 

A3. 10 

^ l[Df(ao)]- j | 2 M|f(ao)| k 
1 -k “ 1 - fc 


We are assuming k < 1/2, so I — A n is invertible (by Proposition 1.5.31), 
and the same argument that led to Equation A2.27 here gives 

|[Df(a„)]- l | < |(Df(ao)]-’| (1 + |A„| + U,| 2 + . . . ) < |[Df(ao)]-‘| 

1 

~ 1 - |A„| 

a yi3.ii 


A. 4 Proof of Differentiability of the 
Inverse Function 


In Section 2.9 we proved the ex- 
istence of an inverse function g. 
As we mentioned there, a com- 
plete proof requires showing that g 
is continuously differentiable, and 
that g really is an inverse, not just 
a right inverse. We do this here. 


Theorem 2.9.4 (The inverse function theorem). Let W c R m be an 
open neighborhood of xq, and f : W — ► R m be a continuously differentiable 
function. Set yo = f(xo), and suppose that the derivative L — (Df(xo)] is 
invertible. 

Let R > 0 be a number satisfying the following hypotheses: 

(1) The ball Wo of radius 2/tyL” 1 1 and centered atxo is contained in W. 

(2) In W 0y the derivative satisfies the Lipschitz condition 


1 

2/*|L-*|* 


|[Df(u)] - (Df(v)]| < 


|u - v| . 


2.9.4 
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There then exists a unique continuously differentiable mapping g from the 
ball of radius R centered at yo (which we will denote V) to the ball Wq: 

g :V—> Wq, 2.9.5 


such that 

f(g(y)) = y *nd [Dg(y)] = [Df (g(y))] -1 . 2.9.6 

Moreover, the image of g contains the ball of radius Ri around Xq, where 


R, = 2ft|L-'| 2 U\L? + jJ - 1 - |L|j . 


2.9.7 


Recall (Equation 2.9.8) that 
/y(x)= f /(x)-y = 0. 


The first inequality on the sec- 
ond line of Equation A4.2 comes 
from the triangle inequality. We 
get the second inequality because 
at each step of Newton’s method, 
hi is at most half of the previ- 
ous. The last inequality comes 
from the fact (Equation 2.9.10 and 
Proposition 1.4.11) that ho(y) < 
IL-'llyo ~y|. 


(1) Proving that g is continuous at yo 

Let us show first that g is continuous at yo: that for all e > 0, there exists 
S > 0 such that when |y - yo| < S, then |g(y) - g(yo)| < e. Since g(y) is the 
limit of Newton’s method for the equation / y (x) = 0, starting at Xo, it can be 
expressed as xq plus the sum of all the steps (ho(y), hj(y), . . . ): 


So 


g(y) = x 0 + ^hi(y). 
*=0 



AA.l 


AA.2 


< E |h*(y)l < |h 0 (y)| (1 + 1 + . . . ) < 2|L-‘||y - y 0 |. 


1=0 



If we set 


S = 


A4.3 


21 L-r 

then when |y - y 0 | < 6, we have |g(y) - g(y 0 )| < c. 

(2) Proving that g is differentiable at yo 

Next we must show that g is differentiable at y 0 , with derivative [Dg(y 0 )] = 
L~ l \ i.e., that 


lim (gCyo + itj-gtyo))-!-^ 

£-0 [k| 


AAA 


When |y 0 + k| € V , define r(k) to be the increment to xo that under f gives 
the increment k to yo: 


f (x 0 + r(k)) = yo + ic. 


>14.5 
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To get the second line we just 
factor out L x . 



Figure A4.1. 

Top: The Kantorovitch theo- 
rem guarantees that if we start at 
xo, there is a unique solution in 
Uo\ it does not guarantee a unique 
solution in any neighborhood of 
xo- (In Section 2.7 we use ao and 
ai rather than xo and xi.) Bot- 
tom: The inverse function theo- 
rem guarantees a unique solution 
in the neighborhood Wo of xq. 


or, equivalently, 

g(yo + k) = x 0 + ?(k), 

Substituting the right-hand side of Equation A4.6 for g(y<) + k) in the left-hand 
side of Equation A4.4, remembering that g(yo) = xo, we find 


lim 

k— 0 


Xo + r(k) - Xp - 

|k| 


L~'k 


f(k)-£,-‘k|r(k)| 
lim — « 

k^o |k| |r(k)| 


it by Eq. A4.5 44.7 

L 1 (lt( k) — (xo + r(£)) f(xo)^ ^ |r(k)| 

— | i m ^ = — "p] • 

k— o |r(k)| |k| 

We know that f is differentiable at xo, so the term 

Lr(k) - f (xq 4- r(k)) H- f(xp) 

l?(k)l 

has limit 0 as r(k) — ♦ 0. So we need to show that r(k) — ► 0 when k - 
Equations A4.6 for the equality and A4.2 for the inequality, we have 

r(k) = g(yo + k) - g(y 0 ) < 2|ZT 1 |(y 0 + k - y 0 ), 

i.e., r(k) < 2|Z,-‘|k. 44.10 

So the limit is 0 as k — ► 0. In addition, the term |r(k)|/|k| is bounded: 

H < 2IL~ l \, 44.11 

|k| 

so Theorem 1.5.21, part (e) says that A4.4 is true. 

(3) Proving that g is an inverse, not a just right inverse 

We have already proved that f is onto the neighborhood V of yo; we want to 
show that it is injective (one to one), in the sense that g(y) is the only solution 
x of f y (x) = 0 with x € Wo- As illustrated by Figure A4.1, this is a stronger 
result than the one we already have from Kantorovitch’s theorem, which tells 
us that f y (x) = 0 has a unique solution in Uo ■ Of course there is no free lunch; 
what did we pay to get the stronger statement? 2 

We will suppose that x is a solution, and show that x = g(y). First, we will 
express f y (x) as the sum of (1) f y (xo), (2) a linear function L of the increment 
to xo, and (3) a remainder r: 

0 = f y (x) = fy(xo) + L(x — x 0 ) -f r, 44.12 

2 We are requiring the Lipsckitz condition to be satisfied on all of Wo, not just on 

U 0 . 


44.8 
0. Using 

44.9 
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The smaller the set in which 
one can guarantee existence, the 
better: it is a stronger statement 
to say that there exists a William 
Ardvark in the town of Nowhere, 
NY, population 523, than to sav 
there exists a William Ardvark 
in New York State The larger 
the set ill which one can guaran- 
tee uniqueness, the better: it is 
a stronger statement to say there 
exists a unique John W. Smith 
in the state of California than to 
say there exists a unique John W. 
Smith in Tinytown, CA. 


Here, to complete our proof of 


Theorem 2.9.4, we show that g 
really is an inverse, not just a 


right inverse: 


i.e., that g(f(x)) = 


x. Thus our situation is not like 
f{x) - x 2 and g{y) = -r y/y. 


In that case f(g{y )) = y, but 

9 (/(*)) ^ * when x < 0. The 

function f{x) — x 2 is neither in- 
jective in any neighborhood of 0 
in the domain, or surjective (onto) 
in any neighborhood of 0 in the 
range. 


Equation A4.20: the first is 
from Equation 2.9.4, and the sec- 
ond because x is in W ( ,, a ball cen- 
tered at Xo with radius 2R\L~ l \. 

This shows in particular that 
g o f(x) = x on the image of g, 
and the image of g contains those 
points in x € Wo such that f(x) € 
V". 


where r is the remainder necessary to make the equality true. If we think of x 
as Xq plus an increment s: 


x = x 0 + s. 


A4.13 


we can express r as 

r = f y (x 0 4- s) - f y (x 0 ) - Ls. A4 . 14 

We will use Proposition A2.1, which says that if the derivative of a differentiable 
mapping is Lipschitz, with Lipschitz ratio M, then 

A/..-, 


/( 


X + h)-f(x)^[Df( X )]hj<y|h| 2 . 


A4.15 


We know that L satisfies the Lipschitz condition of Equation 2.9.4, so we have 



i.e., 




Multiplying Equation A4.12 by L _1 , we find 



A4.16 


x - xo = -L ^(xo) - L~ l r. A4A7 

Remembering from the Kantorovitch theorem (Theorem 2.7.11) that ao = 
a i +[Df(ao)] -l f(ao), which, with our present notation, is xo = Xi -f-L“ 1 fy(xo). 
and substituting this value for xo in Equation A4.17, we see that 


x - X! = >14.18 

We use the value for r in Equation A4.16 to get 

|x-xi|< ^M\L~ x \\x - x 0 | 2 . v44. 19 

Remember that 

M = 2 fl |^ T i |2 and I* - xol 2 < 4« 2 |£ _1 |' 2 - >14.20 

Substituting these values in Equation A4.19, we get 

l*- x il ^ \ 2R\L~'\* >14.21 

i.e., |x-x,| < \L~'\R. A4.22 


So x is in a ball of radius 2\L~ 1 \R around xo, and in a ball of radius \L~ l \R 
around x 1? and (continuing the argument) in a ball of radius \L~ l \R/2 around 
x 2 Thus it is the limit of the x„, i.e., x = g(y). □ 
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A. 5 Proof of the Implicit Function Theorem 


It would be possible, and in 
some sense more natural, to prove 
the theorem directly, using the 
Kantorovitch theorem. But this 
approach will avoid our having to 
go through all the work of proving 
that the implicit function is con- 
tinuously differentiable. 

When we add a tilde to F, ere- 
ating the function F of Equation 

A5.6, we use F ^ ^ as the first 

n coordinates of F and stick on y 
(m coordinates) at the bottom; y 
just goes along for the ride. We 
do this to fix the dimensions: F : 

jj^n+m ]R n + m can h ave an 

verse function, while F can’t. 


Theorem 2.9.10 (The implicit function theorem). Let W bean open 
neighborhood of c - (g) € R n+m , and F : W R n be differentiable , with 
F(c) = 0. Suppose that the n x n matrix 

[DjF(c), . . . , Z>„F(c)], >15.1 


representing the first n columns of the derivative of F, is invertible. 
Then the following matrix , which we denote L, is invertible also : 


L = 


[A F(c), . . . , D n F(c)] (D n+ lF(c), .... AnF(c)] 

0 /, 


l m 


>15.2 


Let Wq = B 2 r\l~i\(c) C R n+m be the ball of radius 2R\L *| centered at 
c. Suppose that R > 0 satisfies the following hypotheses: 


(1) ft is small enough so that Wq C W. 

(2) In Wo, the derivative satisfies the Lipschitz condition 

|[DF(u)j - [DF(v)]| < - v|- A5.3 

Let Br( b) € R m be the ball of radius R centered at b. 

There then exists a unique continuously differentiable mapping 

g : B*(b) -♦ B 2 «|L->|( a ) 8ucb that F ) = 0 for all ye B&(b) t 

A5.4 

and the derivative of the implicit function g at b is 

[Dg(b)] = -[B 1 F(c),...,B n F(c)]-M^n + iF(c),...,P n+m F(c)j. A5.5 


Proof. The inverse function theorem is obviously a special case of the implicit 
function theorem: the special case where F ^ ^ = f(x)— y; i.e., we can separate 

out the y from F ^ ^ . There is a sneaky way of making the implicit function 

theorem be a special case of the inverse function theorem. We will create a new 
function F to which we can apply the inverse function theorem. Then we will 
show how the inverse of F^ will give us our implicit function g. 

Consider the function F : W — ► R n x R m , defined by 

f (y) = ( F ^)’ ** 

where x are n variables, which we have put as the first variables, and y the 
remaining m variables, which we have put last. Whereas F goes from the high- 
dimensional space, W C IR n+m , to the lower- dimensional space, R n , and thus 
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had no hope of having an inverse, the domain and range of F have the same 
dimension: n + m, as illustrated by Figure A5.1. 



FIGURE A5.1. The mapping F is designed to add dimensions to the image of F so 
that the image has the same dimension as the domain. 


Exercise 2.3.6 addresses the 

question why L is invertible if ~ 

[D\ F(c),. . . , D„F(c)] is invertible. So now we will find an inverse of F, and we will show that the first coordinates 

of that inverse are precisely the implicit function g. 

The derivative of F at c is 


[DF(c)) = 


(£>iF(c), . . . , £>„F(c)] [£> n+1 F(c), .... D n+m F(c)\ 

0 I 


= L, A5.7 


The derivative 


[DF(u)) = 


' |DF(u)] ' 

0 | / J 


is an (n + m) x (n + m) matrix; 
the entry [DF(u)j is a matrix n 
tall and n + m wide ; the 0 matrix 
is m high and n wide; the identity 
matrix is m x m. 

We denote by Br ^ ® ^ the ball 

of radius R centered at ^ ® 

While G is defined on all of 
B r f ® J , we will only be inter- 
ested in points G 




showing that it is invertible at c precisely when \D\ F(c), . . . , D n F(c)J is invert- 
ible, i.e., the hypothesis of the inverse function theorem (Theorem 2.9.4). 


Note that the conditions (1) and (2) above look the same as the conditions 
(1) and (2) of the inverse function theorem applied to F (modulo a change of 
notation). Condition (1) is obviously met: F is defined wherever F is. There 
is though a slight problem with condition (2): our hypothesis of Equation A 5.3 
refers to the derivative of F being Lipschitz; now we need the derivative of F to 
be Lipschitz in order to show that it has an inverse. Since the derivative of F 


is [DF(u)] = 


0 | / 

matrices cancel, giving 


[DF(u)J 


, when we compute |[DF(u)] - [DF(v)]|, the identity 


|[DF(u)] - [DF(v)]| = |[DF(u)j - [DF(v)]|. 


>15.8 


Thus F is locally invertible; there exists a unique inverse G : Bn — * W'o- 

In particular, when |y - b| < R, 


f ( 6 (y)) = (y)' 


>15.9 
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Equation A5.10: The function 
G has exactly the same relation- 
ship to G as F does to F; to go 
from G to G we stick on y at the 
bottom. Since F does not change 
the second coordinate, its inverse 
cannot change it either. 


Exercise A5.1 asks you to show 
that the implicit function found 
this way is unique. 


In Equation A5.13 Dg(b) is an 
n x m matrix, J is m x m, and 
0 is the n x m zero matrix. In 
this equation we are using the fact 
that g is differentiable; otherwise 
we could not apply the chain rule. 


Now let’s denote by G the first n coordinates of G: 




(; 


SO 



Since F is the inverse of G, 

*(«(!))-(»• 

By the definition of G we have 

Now set g(y) = G This gives 


p/g(y)'j = l F ( y 


g(y 

2 

y 


>} ) -(?)■ >*• '(*?’) -» 


A5.10 


A5.ll 


A5.12 


g is the required “implicit function”: F ^ implicitly defines x in terms of y, 
and g makes this relationship explicit. 

Now we need to prove Equation A5.5 for the derivative of the implicit func- 
tion g. This follows from the chain rule. Since F = 0, the derivative 

of the left side with respect to y is also 0, which gives (by the chain rule), 

Hf)] I”?’ 


= 0 . 


A5.13 


Remember that c = 



So 


l I l 

2?iF(c) . . . D„F(c), D„+ iF(c), . 

I I I 


• ^n+mF(c) 

I 


* 

Dg(b)' 


. I . 


= 0, 45.14 


bW). 


If A denotes the first n columns of (DF(c)] and B the last m columns, we have 
A[Dg(b)] + B = 0, so A[Dg(b)] = -£, so [Dg(b)J = -A" 1 #. A5.15 
Substituting back, this is exactly what we wanted to prove: 


[Dg(b)j = -|D,F(c) ^Ffc)]- 1 (£> n+1 F(c), . . . , D„ +m F(c)| . □ 


-A - 1 
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A. 6 Proof of Theorem 3 . 3 . 9 : Equality of Crossed 
Partials 


Theorem 3.3.9. Let f : U — ► R be a function such that all second partial 
Of course the second partials derivatives exist and are continuous. Then for every pair of variables X{,Xj, 
do not exist unless the first par- crosse< j partials are equal : 


tials exist and are continuous, in 
fact, differentiable. 


3.3.20 


D,(Dif)( a) = Di(Djf)(a). 

As we saw in Equation 1.7.5, 
the partial derivative can be writ- p r0 of. First, let us expand the definition of the second partial derivative. In 

the first line of Equation A6.1 we express the first partial derivative D, as a 
limit, treating Djf as nothing more than the function to which applies. In 
the second line we rewrite Djf as a limit: 


ten in terms of the standard basis 
vectors: 

/(a 4- he.) - /(a) 


Dif( a) = lim 

h— 0 


Please observe that the part 
in parentheses in the last line of 
Equation A6.1 is completely sym- 
metric with respect to e t and e^; 
after all, /(a 4- he, 4 ke } ) = /(a 4 
kej 4- he,). So it may seem that 
the result is simply obvious. The 
problem is the order of the limits: 
you have to take them in the or- 
der in which they are written. For 
instance, 


V 


lim lim — . 

x— »o y-»o x i 4 y l 


= 1 , 


but 


lim lim ~ — — — -1. 
v— .o x— .o x l 4 yr 


i 


a) = lim - (Djf (a + he,) - Djf( a)) 


= hSoli + hSi + kS i>~ A® + *%)) ~ Ita ^(/( a + kej) -/( a))) 

^ v ^ ^ ' 


Djf(A+hSi) 


Dyf( a) 


A6.1 


= l ‘™0 Jfo i ^ (a + ASi+ + M*))~ (/(» + kej) - f(a))\ 

= lim 0 ^ (/(« + *e. + **,) - /(a + /!*,) - /(a + fee,) + /(a) ) . 

We now define the function 

u(t) = /(a 4 t&i 4 kej) — /(a 4 te*), so that 

u(h) = f(a 4 he, 4 he^) - /(a 4 he,) and u(0) = /(a 4 he ; ) - /(a). 

This allows us to rewrite Equation A6.1 as 

Di(Djf)( a) = lim lim ^(u(ft) - u(0)). 

Since u is a differentiable function, the mean value theorem (Theorem 1.6.9) 
asserts that for every h > 0, there exists hi between 0 and h satisfying 

u(h) — ti(0) , 

7 = ^ (hi), so that u(h ) - u(0) = hu'(hi). 


A6 . 2 
A6 . 3 

A6A 


A6 . 5 


This allows us to rewrite Equation A6.4 as 


Di(Djf)(a) = lim lina -i-hu'(hi). 

h —* 0 k—Q hk 


A6.6 


Since 
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This is a surprisingly difficult 
result. In Exercise 4.5.11 we give 
a very simple (but less obvious) 
proof using Fubiui’s theorem. 
Here, with fewer tools, we must 
work harder: we apply the mean 
value theorem twice, to carefully 
chosen functions. Even having 
said this, the proof isn’t obvious. 


u(h\) = /(a + hiei + kej) - /(a 4- h ie f ), A6.7 

the derivative of u(h\) is the sum of the derivatives of its two terms: 

u'(hi) = Dif(a + /lie* + kej) - Z>*/(a + so A6.8 

Dj(Djf)( a) = Hm hm ^ (/?,/( a + h & + kej) - D t f(a + hie,)^ . A6.9 

v v — * 

u'ihx) 

Now we create a new function so we can apply the mean value theorem again. 
We replace the part in brackets on the right-hand side of Equation A6.9 by the 
difference v(k) - v(0), where v is the function defined by 

v(k) = Dif( a + h\ei + kej). A6.10 

This allows us to rewrite Equation A6.9 as 

D ii D jf)( a) = Jim lim i(v(fc) - v(0)). A6.ll 

/i— #0 jc-»0 K ' ' 


Once more we use the mean 
value theorem. 


Again v is differentiable, so there exists ki between 0 and k such that 

v (k ) — v(0) = kv ( k \ ) = k^Dj (£)j(/)) (a + h\e{ + k\ej . A6.12 

Substituting this in Equation A6.ll gives 


D x(Djf)( a) = Jim lim -rkv'(ki) 

n —*0 k —*0 K 




>16.13 


Now we use the hypothesis that the second partial derivatives are continuous. 
As h and k tend to 0, so do h\ and k \ , so 


+ hA + kiSj)) 

= a). □ 


A6.14 


A. 7 Proof of Proposition 3.3.19 


Proposition 3.3.10 (Size of a function with many vanishing partial 
derivatives). Let U be an open subset of R n and f : U -4 R be a C k 
function. If at a € U all partiaJs up to order k vanish (including the Oth 

partial derivative, i.e., f(a)), then 

h— *o |h| fc 


3.3.39 
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The case k = 1 is the case 
where / is a C 1 function, once 
continuously differentiable. 


Proof. The proof is by induction on k , starting with k = 1. The case k = 
1 follows from Theorem 1.9.5: if / vanishes at a, and its first partials are 
continuous and vanish at a, then / is differentiable at a, with derivative 0. So 


0 = lim 
h — *0 


0 0 

/( a + h)-7(^)-[D7w|h 
|h| 


— 0 since / is differentiable 


lim 

h — >0 


/(a + h) 

|h| 


AT. I 


This proves the case k — 1. 

Now we write /(a 4- h) in a form that separates out the entries of the incre- 
ment vector h, so that we can apply the mean value theorem. 

Write /(a 4- h) = /(a + h) - /(a) = 


This equation is simpler than 
it looks. At each step, we allow 
just one entry of the variable h to 
vary. We first subtract, then add, 
identical terms, which cancel. 


changing only h\ changing only hi 


/ oi + h\ \ 


( a i \ 


/ ai \ 


/ \ 

a 2 4 - h 2 


a 2 4" h 2 


a 2 4 - h 2 


a 2 

03 4 * /13 


03 4 - /13 


03 4 - /13 


03 4 - /13 

I 

-/ 

\ 

4-/ 

• 

-/ 

; 

o n — 1 4 - /i„_i 


On — 1 4 - /in— 1 


o n — 1 4 - /i n -i 


On— 1 4 - /l n — 1 

\ a„ 4 - h n / 


^ O n 4 " /in ) 

\ O n 4 - /l n / 


^ O n + /in / 










minus 


plus 


minus 


/ 

+ / 


changing only 63 


CL\ 

a 2 

a 3 4- /13 




changing only h n 

\ 


ai 

a 2 

03 


m 



«3 


I O n _j 4" /in— 1 I 
\ On 4" h n ) 

plus 



On— 1 


O n — 1 


\ O n -f /l n / 

V 

a„ / 

' v ' 



plus 



( 01 4- h\ \ 


/ Q i \ 

Qsi 4 - h 2 


0 2 

03 4- /13 


03 

• 

-/ 


O n — 1 4- /l n — 1 


fl n-l 

V O n 4 - /l n / 


\ On / 


A7.2 
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By the mean value theorem, the i'th term in Equation A7.2 is 


Proposition 3.3.19 is a useful 
tool, but it does not provide an 
explicit bound on how much the 
function can change, given a spe- 
cific change to the variable: a 
statement that allows us to sav. 
for example, “All partial deriva- 
tives of / up to order k = 3 vanish, 
therefore, if we increase the vari- 
able by h = 1/4, the increment 
to / will be < 1/64. or < cl/64, 
where c is a constant which we 
can evaluate.” Taylor’s theorem 
with remainder (Theorem A9.5) 
will provide such a statement. 


b, 



/ CL\ \ 


( ° l ^ 


/ a i \ 


di-l 


CLi- 1 


Qi-i 

/ 

a t -f h, 

-f 

a t 

= hiDiJ 



1 + ^i+l 


fli+i -f h{+\ 

!' 

a,+i + hi+ 1 


V (In "4“ hn / 


\ a n -f h n ) 


\ flu "4* hfi J 


f(a+h)-f(a) 


Tnb.) 


A7.3 


for some b t e (a,, a, + hi). Then the ith term of /(a + h) is This 

allows us to rewrite Equation A7.2 as 


The mean value theorem: If 
/ : (a, a + h] — * R is continuous, 
and / is differentiable on (a. a+h ), 
then there exists 6 € (a, a-\-h) such 
that 


/'(*) = 


/(« + /«)-- m 

h 


i.e.. 


/(a + h) - f{n ) - hf'(b). 


/(a + h) = /(a + h) - /(a) = ^ h t D t f( b,). 

t=i 

Now we can restate our problem; we want to prove that 


^/(a + h) = |jm /(a+i) = fc A/(bj)_ 0 

h— o |h| A ' h— o |h| |h|^ — 1 “h-0 |h[ |h|*~ l 


Since |/i,|/|h| < 1, this comes 




MbiUo. 


lim 
h— o |h|* -1 


A7.4 


A7.b 


A7.6 


Now set b, = a + c t ; i.e., c* is the increment to a that produces b t . If we 
substitute this value for b» in Equation A7.6, we now need to prove 


lim P|/ ! a + C '' ) 
h— *0 [hj* -1 


A7.7 


In Equation A7.8 we are sub- 
stituting D t f for / and c, for h in 
Equation 3.3.39. You may object 
that in the denominator we now 
have k — I instead of k. But Equa- 
tion 3.3.39 is true when / is a C k 
function, and if / is a C k function, 
then D x f is a C k '* function. 


By definition, all partial derivatives of / to order k exist, are continuous on U 
and vanish at a. By induction we may asstime that Proposition 3.3.19 is true 
for Dif, so that 


Thus we can assert that 


lim 

5,-*0 


Djf{ a + c t ) 



A7.8 


lim 

fi-o 1 h 1 ^ ~ 1 


A7.9 
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You may object to switching the c* to h. 


But we know that |cj 



0 

0 



and Ci is between 0 and hi. 


A7.10 


L K J 

So Equation A7.8 is a stronger statement than Equation A7.9. Equation A7.9 
tells us that for any c, there exists a S such that if |h| < 6, then 


Dili a + h) . 

|h|‘-> 


i47.ll 


If |h| < «, then |c,| < S. And putting the bigger number |h| fc ~ 1 in the denomi- 
nator just makes that quantity smaller. So we’re done: 


[im = V lim 4 Dif {* t *> ■ = o. □ 

h-0 M* STh-0|h| |h|*— 1 


A7.12 


A. 8 Proof of Rules for Taylor Polynomials 


Proposition 3.4.3 (Sums and products of Tfeylor polynomials). Let 
U cR n be open, and f,g:U—>RbeC k functions. Then f + g and fg are 
also of class C k , and their Taylor polynomials are computed as follows. 

(a) The Taylor polynomial of the sum is the sum of the Taylor polynomials: 

p /+ 9 ,.( a + fi) = */,.(«» + h) + *£.(* + *»)■ 3.4.8 

(b) The Taylor polynomial of the product fg is obtained by taking the 
product 

Pfja + h) ■ P*> + h) 3.4.9 

and discarding the terms of degree > k. 


Proposition 3.4.4 (Chain rule for Thylor polynomials). Let U C R n 
and V c R be open, and g : U V, f : V -> U be of class C k . Then 
f o g : U —+ R is of class C k , and if g(a) = b, then the Taylor polynomial 
*?o,..(a + h) is obtained by considering the polynomial 

P^Pfja + h)) 

and discarding the terms of degree > k. 
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Big O has an implied constant, 
while little o does not: big O pro- 
vides more information. 

Notation with big O “signfi- 
cantly simplifies calculations be- 
cause it allows us to be sloppy — 
but in a satisfactorily controlled 
way.”— Donald Knuth, Stanford 
University ( Notices of the AMS, 
Vol. 45, No. 6, p. 688). 


These results follow from some rules for doing arithmetic with little o and 
big O. Little o was defined in Definition 3.4.1. 

Definition A8.1 (Big O). If h(x) > 0 in some neighborhood of 0, then 
a function / is in 0(h) if there exist a constant 0 and 6 > 0 such that 
|/(x)| < Ch(x) when 0 < |x| < 6; this should be read “f is at most of order 

Below, to lighten the notation, we write 0(|x| fc ) + 0(1x1*) = 0(|xj*) to mean 
that if / € 0(|x| fc ) and g € Odxj 1 ), then / + g € 0(|x| fc ); we use similar 
notation for products and compositions. 


Proposition A8.2 (Addition and multiplication rules for o and 0). 

Suppose th&t 0 < k < l are two integers. Then 


For example, if / € 0(|x[ 2 ) 
and g 6 0(|x| 3 ), then / 4- g is 
in 0(|x| 2 ) (the least restrictive of 
the O, since big O is defined in 
a neighborhood of zero). How- 
ever, the constants 0 for the two 
0(|x| 2 ) may differ. 

Similarly, if / € o(|x| 2 ) and g € 
o(|x| 3 ), then f+g is in o(|x| 2 ), but 
for a given e, the S for / 6 o(Jx| 2 ) 
may not be the same as the S for 
/ + 9 € o(|x| 2 ). 

In Equation A8.2, note that the 
terms to the left and right of the 
second inequality are identical ex- 
cept that the 021 x 1 * on the left be- 
comes C* 2 (x(* on the right. 


1.0(|x|*) + 0(|x| , )=0(|x|*) 

2. o(|x| k ) + odxl 1 ) = o(|x|*) formulas for addition 

3. 0(1x1*) + Oflx|‘) = o(|x|*) ifk<l 

4. Odxl*) 0(1x1*) = 0(|x| k+l ) formulas for multiplication 

5. o(|x| k ) Oflxf) = o(|x|* +I ) 

Proof. The formulas for addition and multiplication are more or less obvious; 
half the work is figuring out exactly what they mean. 

Addition formulas. For the first of the addition formulas, the hypothesis is 
that we have functions /(x) and g(x), and that there exist S > 0 and constants 
0i and 02 such that when 0 < |x| < <5, 

l/(x)| < 0i|x|* and |y(x)| < 02IXI 1 . ASA 

If <5i = inf {<5, 1}, and 0 = 0,+ 0 2t then 

/(x) + y(x) < 0, |x| fc + 0 2 |x|* < 0! |x|* + 0 2 |x|* = 0|x|*. AS. 2 


All these proofs are essentially 
identical; they are exercises in fine 
shades of meaning. 


For the second, the hypothesis is that 


lim 

| x |— *0 


/(*) 


= 0 and 


£(x) 


lim . 
|x|— o |x|* 


= 0 . 


Since l > k, we have lim 


1 * 1 - 


♦o i iT = 0 also, so 

M 


lim 

l*l-o 


/(x) + g(x) 

!vl fc 


= 0 . 


AS. 3 


ASA 
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For the statements concerning 
composition, recall that / goes 
from U , a subset of IR n , to V , a 
subset of IR, while g goes from V 
to IR, so g o / goes from a subset 
of M n to IR. Since g goes from a 
subset of IR to IR, the variable for 
the first term is x, not x. 

To prove the first and third 
statements about composition, the 
requirement that / > 0 is essen- 
tial. When l = 0, saying that 
/ 6 0(1*1*) — 0(1) is just saying 
that / is bounded in a neighbor- 
hood of 0; that does not guaran- 
tee that its values can be the input 
for g , or be in the region where we 
know anything about g. 

In the second statement about 
composition, saying / 6 o(l) pre- 
cisely says that for all c, there ex- 
ists <5 such that when x < <5, then 
/(x) < e; i.e. 

lim /(x) = 0. 

x— o 

So the values of / are in the do- 
main of g for Jx| sufficiently small. 


The third follows from the second, since g € 0(|x| z ) implies that g € o(|x|*) 
when l > k. (Can you justify that statement? 3 ) 

Multiplication formulas. The multiplication formulas are similar. For the 
first, the hypothesis is again that we have functions /(x) and <?(x), and that 
there exist S > 0 and constants C\ and C 2 such that when |x| < 6, 

l/(x)| < C,|x|*. |p(x)| < C 2 |x|‘. -48.5 

Then f(x)g{x) < CiC 2 |x|* +z . 

For the second, the hypothesis is the same for /, and for g we know that for 
every there exists r; such that if |x| < 77, then |(?(x)| < e|x| z . When |x| < 77, 

|/(x)p(x)| < C,e|x|* +Z , 48.6 


so 


lim 

M-o 


l/(x)g(x)| 

M*+ z 


= 0 . 


.48.7 


To speak of Taylor polynomials of compositions , we need to be sure that 
the compositions are defined. Let U be a neighborhood of 0 in R n , and V be 
a neighborhood of 0 in IR. We will write Taylor polynomials for compositions 
go f, where f : U - {0} — * R and g : V -> 1 R: 

u - { 0 } -U r 

u 48.8 

V -£-► IR 


We must insist that g be defined at 0, since no reasonable condition will prevent 
0 from being a value of /. In particular, when we require g € 0(x k ), we need 
to specify k > 0. Moreover, /(x) must be in V when |x| is sufficiently small; so 
if / € 0(|x| z ) we must have l > 0, and if / 6 o(|x| z ) we must have l > 0. This 
explains the restrictions on the exponents in Proposition A8.3. 


Proposition A8.3 (Composition rules for o and O). Let f :U-{ 0} 

IR and g : V —> R be functions, where U is a neighborhood of 0 in R n , and 
V c IR is a neighborhood of 0. We will assume throughout that k > 0. 

1. Ifg € 0(|*|*) and / € Oflx| j ), then gofe Oflx|* z ), if l > 0. 

2. Ifg € 0(|*|*) and f € o( |x| z ), then go f € o(|x|* z ), if l > 0. 

3. Ifg e o(l x l fc ) and fe Oflx| z ), then go f e o(M fcz ), If l > 0. 


Proof. For the formula 1, the hypothesis is that we have functions /(x) and 

g(x )» and that there exist <5, > 0, S 2 > 0, and constants Ci and C 2 such that 
when |*| < S\ and |x| < S 2 , 

1^)1 < Ci |*|*, |/(x)| < C 2 |x| z . 48.9 

3 Let’s set / = 3 and k = 2. Then in an appropriate neighborhood, g(x) < Cjx) 3 = 
C|x||x| ; by taking |x| sufficiently small, we can make C|x| < c. 
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We may have /(x) = 0, but we 
have required that g( 0) = 0 and 
that k > 0, so the composition is 
defined even at such values of x. 


Since l > 0, /(x) is small when |x| is small, so the composition g(f{x)) is 
defined for |x| sufficiently small: i.e., we may suppose that rj > 0 is chosen so 
that T) < S 2 , and that |/(x)| < Si when |x| < rj. Then 

|«(/(x))| < C,|/(x)|* < C, (C,|x|')* = C,CJ|X|“. >18.10 

For formula 2, we know as above that there exist C and <5i > 0 such that 
|p(x)| < C|x|* when |x{ < 6i. Choose e > 0; for / we know that there exists 
6 2 > 0 such that |/(x)| < e|x|' when |x| < S 2 . Taking S 2 smaller if necessary, 
we may aLso suppose < <5j. Then when |x| < S 2 , we have 

|s(/( x ))| S C|/(x)|* < C (t|x| , ) <! = |x|«. >18.11 

an arbitrarily 
small < 


This is where we are using the 
fact that l > 0. If l — 0, then 
making 62 small would not make 
Clfol* small. 


For formula 3, our hypothesis g £ o(|x| fc ) asserts that for any e > 0 there 
exists <5i > 0 such that |<?(x)| < e\x\ k when |x| < <5i. 

Now our hypothesis on / says that there exist C and S 2 > 0 such that |/(x)| < 
C|x|* when |x| < £ 2 ; taking S 2 smaller if necessary, we may further assume that 
C\hf < Then if |x| < S 2 , 

| 9 (/(x))|<€|/(x)|*< e |C|x| , |* = e C' [ |x|" 1 . □ >18.12 


Proving Propositions 3.4.3 and 3.4.4 

We are ready now to use Propositions A8.2 and A8.3 to prove Propositions 
3.4.3 and 3.4.4. There are two parts to each of these propositions: one asserts 
that sums, products and compositions of C k functions are of class C k ; the other 
tells how to compute their Taylor polynomials. 

The first part is proved by induction on k, using the second part. The rules 
for computing Taylor polynomials say that the ( k - Impartial derivatives of a 
sum, product, or composition are themselves complicated sums of products and 
compositions of derivatives, of order at most k — 1, of the given C k functions. 
As such, they are themselves continuously differentiable, by Theorems 1.8.1 and 
1.8.2. So the sums, products and compositions are of class C k . 

Computing sums and products of Taylor polynomials. The case of sums 
follows immediately from the second statement of Proposition A8.2. For prod- 
ucts, suppose 

/(x) = p fc (x) + r fc (x) and g(x) = q k (x) 4- s k (x), A8.13 

with r k ,s k e o(|x| fc ). Multiply 

f(x)g(x) = (p fc (x) + r k (x))(q k (x) + s*(x)) = P k (x) 4- R k (x ), A8.14 

where P k {x) is obtained by multiplying p k (x)q k (x) and keeping the terms of 
degree between 1 and k. The remainder R k (x) contains the higher-degree terms 
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of the product p k (x)q k (x) t which of course are in o(|x| fc ). It also contains the 
products r*(x)s*(x),r*(x) 9 *(x), and p k (x)s k (x), which are of the following 

forms: 

0(l)s fc (x) e o(|x| fc ); 

r*(x)0(l) € o(|x| fc ); >18.15 

r k (x)s k (x) € o(|x| 2fc ). 


Computing compositions of Taylor polynomials. Finally we come to the 
compositions. Let us denote 


/(a + fi) 



constant 

term 


+ 


gL(jj) 

polynomial terms 
1 < degree <k 



remainder 


A8.16 


Note that m is an integer, not 
a multi-index, since g is a function 
of a single variable. 


separating out the constant term; the polynomial terms of degree between 1 and 
k, so that \Q) a (h)| £ 0(|h|); and the remainder satisfying r£ a ( h) € o(|h| fc ). 
Then 

9 o/(a+h) = P k „ (b + <#, a (h) + r}; .(h)) + r* t (i+ Qji, .(h) + r* .(h)) . >18.17 

Among the terms in the sum above, there are the terms of P* fe (6 4* Q* a (h)) of 

degree at most k in h; we must show that all the others are in o(|h| fc ). 

Most prominent of these is 

r g,b(fi + + r /,a(i*)) € o(|0(|h|) + o(|fi|*|*) = o(|0(|fi|)| fc ) = o(|h|*), 

A8.18 

using part (3) of Proposition A8.3. 

The other terms are of the form 

^jD m g(b) (b + <#..(h) + r*.(h)) m . >18.19 


In Landau’s notation, Equation 
A9.1 says that if / is of class C k+l 
near a, then not only is 

/(a + h) - P,‘.(a + h) 

in o(|h|*); it is in fact in 0(|h]* +1 ); 
Theorem A9.7 gives a formula for 
the constant implicit in the O. ^ 


If we multiply out the power, we find some terms of degree at most k in the 
coordinates hi of h, and no factors r* a (h): these are precisely the terms we are 
keeping in our candidate Taylor polynomial for the composition. Then there are 
those of degree greater than k in the hi and still have no factors r* a (h), which 

are evidently in o(|h| fc ), and those which contain at least one factor r* a (h). 
These last are in 0(l)o(|h| fc ) = o(|h| fc ). □ 

.9 Taylor’s Theorem with Remainder 


It is all very well to claim (Theorem 3.3.18, part (b)) that 

/(a + h)-P* a (a + h) n 

hm = 0; 

h^o |h|* 


A9.1 
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that doesn’t tell you how small the difference /(a + h) — P£ a (a + h) * s f° r an y 
particular h ^ 0. 

Taylor’s theorem with remainder gives such a bound, in the form of a multiple 
of jti|* +1 . You cannot get such a result without requiring a bit more about the 
function /; we will assume that all derivatives up to order k + 1 exist and are 
continuous. 

Recall Taylor’s theorem with remainder in one dimension: 


When k = 0, Equation A9.2 is 
the fundamental theorem of calcu- 
lus: 


g{a + h) 



g'(a + t)dt. 


remainder 


Theorem A9.1 (Thy lor ’s theorem with remainder in one dimen- 
sion). If g is (k A l)-times continuously differentiable on (a — R, a + R), 
then, for |h| < R, 

P;.a(*+V 

(Taylor polynomial of g at a, of degree k) 

/ — \ 

g(a + h)= g(a) + g[(a)h + • • • + ^g (k) (a)h k 2 

+ j i f(h-t) k g^'Ha + t)dt. 

^ * — * 

remainder 


We made the change of vari- 
ables s = a 4- £, so that as t goes 
from 0 to h, s goes from a to x. 


Proof. The standard proof is by repeated integration by parts; you are asked 
to use that approach in Exercise A9.3. Here is an alternative proof (slicker and 
less natural). First, rewrite Equation A9.2 setting x = a + h: 

g(x) = g(a) + g'(a)(x - a) + • • • + ^g (k) (a)(x - a) k 

. -x -49.3 

+ k\J 

Now think of both sides as functions of a, with x held constant. The two 
sides are equal when a = x: all the terms on the right-hand side vanish except 
the first, giving g(x) = g(x). If we can show that as a varies and x stays fixed, 
the right-hand side stays constant, then we will know that the two sides are 
always equal. So we compute the derivative of the right-hand side: 


= o 


= o 


y(«)+W(«) + (* - «)®"(«)) + (-(x - a)g"(«) + ( * - a ^ fl " ,(a) ) 


+ 


= 0 

, ( (x-a) k ~ l gW(a) , (x - a)V* +1> (a)\ (x - a) k g( k+1 \a) 

V (* - 1)! + k\ ) k id . ’ 

S V ^ 

derivative of the remainder 

A9A 

where the last term is the derivative of the integral, computed by the fundamen- 
tal theorem of calculus. A careful look shows that everything drops out. □ 
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Evaluating the remainder: in one dimension 

To use Taylor’s theorem with remainder, you must “evaluate” the remainder. 
It is not useful to compute the integral; if you do this, by repeated integrals by 
parts, you get exactly the other terms in the formula. 


Another approach to Corollary 
A9.3 is to say that there exists c 
between a and x such that 


±j\x-t) k g' k + l) (t)dt 

= - a)(.x - c) V* +,, (c). 


Theorem A9.2. There exists c between a and x such that 

fk+l( c ) 

/(a + h) = P,*> + h) + Jj^h k+l - 


Corollary A9.3. If |/< fc+l )(a + 1)| < C for t between 0 and h, then 

I /(a + h) - P}Ja + h)\ < (kT[y hk+1 


A calculator that computes to 
eight places can store Equation 
A9.5, and spit it out when you 
evaluate sines; even hand calcula- 
tion isn’t out of the question. This 
is how the original trigonometric 
tables were computed. 

Computing large factorials is 
quicker if you know that 6! = 620. 

It isn’t often that high deriva- 
tives of functions earn be so easily 
bounded; usually using the Tay- 
lor’s theorem with remainder is 
much messier. 


Example A9.4 (Finding a bound for the remainder in one dimension). 
A standard example of this sort of thing is to compute sin 0 to eight decimals 
when |0| < 7t/6. Since the successive derivatives of sin 9 are all sines and cosines, 
they are all bounded by 1, so the remainder after taking k terms of the Taylor 
polynomial is at most 


1 /TTN** 1 

(*T+ 1)! \6/ 


A9.5 


for |0| < 7r/6. Take k = 8 (found by trial and error); 1/9! = 3.2002048 x 10 -6 
and (?r/6) 9 % 2.76349 x 10“ 3 ; the error is then at most 8.8438 x 10~ 9 . Thus we 
can be sure that 


sin 6 = 6 — 

to eight decimals when |0| < w/6. 


0 3 05 07 09 

3\ + SI ” 7\ + 9\ 
A 


A9.6 


Taylor’s theorem with remainder in higher dimensions 


Theorem A9.5 (Taylor’s theorem with remainder in higher dimen- 
sions). Let U C R n be open, let f : If — ► R a function of C h+l , 
and suppose that the interval [a, a + h] is contained in U. Then there exists 
c € [a, a + h] such that 

/(a + h) = Pjf m ( a + h) 4- £ &if( c) 1? . 

/€# +l 


A9.7 
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Proof. Define f{t) = a + fh, and consider the scalar- valued function of one 
variable g(t) = Theorem A9.2 applied to g when h - 1 and a = 0 says 

that there exists c. with 0 < c < 1 such that 

* ( *)( 0 ) 


p(l) = p(0) + • ■ • + 


9 -^ i +y k+t Hc). 


k\ 


Tavlor polynomial 


k\ 

^ —v "■ ~ 

remainder 


A9.S 


We need to show that the various terms of Equation A9.8 are the same as the 
corresponding terms of Equation A9.7. That the two left-hand sides are equal 
is obvious; by definition, g( 1) = /(a 4- h). That the Taylor polynomials and the 
remainders are the same follows from the chain rule for Taylor polynomials. 

To show that the Taylor polynomials are the same, we write 


P 9 k o(t) = !?,(/$. oW) = E E i*>if (•)(&)' 


m=0 l€l™ 


= E ( E >//(a)(h) ,N ) *"• 

m=0 \lel™ J 


A9.9 


This shows that 

9(0) + ■ • • + = /’/‘.(a + h). 49.10 

For the remainder, set c = <p(c). Again the chain rule for Taylor polynomials 
gives 


k + 1 1 

C'w = p u( p ^'m = E E 


m= 0 


= E ( E ^,/(c)(h)M r. 

m=0 \/ei- • / 


>49.11 


Looking at the terms of degree H 1 on both sides gives the desired result: 

^ ( ‘ +,, (c)= E 7i0//(c)(h)'- □ *>12 

/€I^‘ 

There are many different ways of turning this into a bound on the remainder; 
they yield somewhat different results. We will use the following lemma. 


We call Lemma A9.6 the poly- 
nomial formula because it general- 
izes the binomial formula to poly- 
nomials. This result is rather nice 
in its own right, and shows how 
multi-index notation can simplify 
complicated formulas. 


Lemma A9.6 (Polynomial formula). 

jjh 7 = — (/i i + • • • + h n ) k . >49. 13 

Proof. We will prove this by induction on n. When n = 1, there is nothing 
to prove: the lemma simply asserts h m = h m . 
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The last step is the binomial 
theorem. 


h\ 


-hr 

• 

, and we will denote h' = 

* 

- h n + 1 . 


- An - 


Let us simply compute: 


E - E E 


*€X* + 1 


m=0 J€l™ 


J! y (k - m)\ 


= Z ^T(fti+ - + ft») m 7irr 


„ m: 

m=0 


ik—m 

Jk-my . "+ 1 


by induction on n 


= hi (k~mYm\ {h ' + " ' + hn)mfl ^ n 


m r i (* “ w*)!m! 


1 


= H + A n + A n + l) m * Q 


A9.14 


This, together with Theorem A9.5, immediately give the following result. 


Theorem A9.7 (An explicit formula for the Taylor remainder). Let 
U C R" be open , f : U —* U a function of class C k+l and suppose that the 
interval [a, a + h] is contained in U. If 


then 


sup sup |Z?//(c)| < C y 
/€X5 +1 c€(«,«+hl 




A9.15 


A9.16 


A. 10 Proof of Theorem 3.5.3 (Procedure for Com- 
pleting Squares) 


Theorem 3.5.3 (Quadratic forms as sums of squares), (a) For any 
quadratic form Q{x) on K”, there exist linearly independent linear functions 
a?i ,a m (x) such that 

Q(x) = (ati(x)) 2 + • ♦ • + (ofc(x)) 2 - (a k +i(it)) 2 (a*+f(tf)) 2 . 3.5.3 

(b) The number k of plus signs and the number l of minus signs in such 
a decomposition depends only on Q and not on the specific linear functions 
chosen . 
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Proof. Part (b) is proved in Section 3.5. To prove part (a) we need to formalize 
the completion of squares procedure; we will argue by induction on the number 
of variables appearing in Q. 

Let Q : R n -» 1 be a quadratic form. Clearly, if only one variable Xi appears, 
then Q(x) = ±ax 2 { with o > 0, so Q(x) = ±(>/aZt) 2 , and the theorem is true. 
So suppose it is true for all quadratic forms in which at most k - 1 variables 
appear, and suppose k variables appear in the expression of Q. Let Xi be such a 
variable; there are then two possibilities: either (1), a term ±ax 2 appears with 
a > 0, or (2), it doesn’t. 

(1) If a term ±ax 2 appears with a > 0, we can then write 

Q(x) = ±(ax ] + 0(x)ii + ) + Qi(x) = ± (y<ix i + Tpfip) + Q>(x) 

j 110.1 

where 0 is a linear function of the k - 1 variables appearing in Q other than £j, 
and Q\ is a quadratic form in the same variables. By induction, we can write 

Q, (x) = ± (a, (x)) 2 ± • • • ± (a m (*)) 2 AW.2 

for some linearly independent linear functions c*j(x) of the k — 1 variables ap- 
pearing in Q other than X*. 

We must check the linear independence of the linear functions ao, aq , . . . , a m , 
where by definition 

ao(£) = \fax x + A10.3 

lyja 

Suppose 

coao H 1- c m a m = 0; A10.4 

then 

(coao H Y c m a m )(x) = 0 A10.5 


Recall that 0 is a function of 
the variables other than x t \ thus 
when those variables are 0, so is 
0(H) (as are ai(ei), . . . ,a m (ei)). 


for every in particular, X = e*, when X{ = 1 and all the other variables are 
0. This leads to 

co\/a = 0, so Co = 0, A10.6 

so Equation A10.4 and the linear independence of c*i, . . . , a m imply ci = • • ■ = 

Cm = 0. 

(2) If no term ±ax 2 appears, then there must be a term of the form ±aiiXj 
with a > 0. Make the substitution Xj = £, + n; we can now write 

<2(x) = ax? + /?(x, u)x i + ■ +Qi(i, u) 

= + + (3,(x ) «) 1 


A10.7 
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where /? and Q\ are functions (linear and quadratic respectively) of u and of 
the variables that appear in Q other than x, and Xj. Now argue exactly as 
above; the only subtle point is that in order to prove c« = 0 you need to set 
u = 0, i.e.. to set x, = Xj = 1. □ 

A. 11 Proof of Propositions 3.8.12 and 3.8.13 
(Frenet Formulas) 


Proposition 3.8.12 (FVenet frame). The point with coordinates X,Y,Z 
(as in Equation 3.8.55) is the point 

a + Xt(0) + yn(0) + Zb(0). 

Equivalently, the vectors t(0), ri(0), b(0) form the orthonormal basis (Frenet 
frame) with respect to which our adapted coordinates are computed. 


Proposition 3.8.13 (FVenet frame related to curvature and torsion). 

The Frenet Frame satisfies the following equations, where k is the curvature 
of the curve at a and r is its torsion: 

t'(0) = «n(0) 

&?(0) = -«t(0) + rb(0) 

b'(0) = - rd(0). 


When Equation 3.8.55 first ap- 
peared we used dots (...) to de- 
note the terms that can be ig- 
nored; here we are more specific, 
denoting these terms by o(X 3 ). 


Proof. We may assume that C is written in its adapted coordinates, i.e., as 
in Equation 3.8.55, which we repeat here: 



° 2 - 3 + ^ - X 3 = 4^X 2 + 4^3 + o(AT 3 ) 

6v^f+6| 2 6 v ' 




biO.3 4* 02^3 v3 ^3 v"3 , v3\ 

___X +-- T X +o(X) 


411.1 


This means that we know (locally) the parametrization as a graph 




X 

%X 2 + ^X 3 + o(X 3 ) , 
^A’ 3 +o(A’ 3 ) 


whose derivative at X is 


411.2 


1 

A 2 X + 4 - . . . 

BsX 2 
2 


6’(X) = 


4 - . . . 


411.3 
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Parametrizing C by arc length means calculating X as a function of arc 
length s, or rather calculating the Taylor polynomial of X(s) to degree 3. Equa- 
tion 3.8.22 tells us how to compute s(X); we will then need to invert this to 

find X(s). 


Lemma All.l. (a) The function 



length of <5'(t) 


has the Taylor polynomial 

s(X) = X + IaIx 3 + o(X 3 ). 
o 


A11.4 


A11.5 


(b) The inverse function X(s) has the Taylor polynomial 

X(s) = s - ^A 2 s 3 + o(s 3 ) to degree 3. All.6 

Proof of Lemma All.l. (a) Using the binomial formula (Equation 3.4.7), 
we have 

= \ + ^A 2 2 t 2 + o(t 2 ) 411.7 

to degree 2, and integrating this gives 

s(X) = j ' (l + ^A$t 2 + o(t 2 fj dt = X + i A\X 3 + o(X 3 ) 411.8 

to degree 3. This proves part (a). 

(b) The inverse function X(s) has a Taylor polynomial; write it as X(s) = 
as + /3s 2 + 7s 3 + o(s 3 ), and use the equation s(X(s)) = s and Equation A11.8 
to write 

s(X(s)) = X(s) + ^AlX(s) 3 + o(s 3 ) 

= (as -I- 0s 2 + 7s 3 + o(s 3 )) -I- ^2 (as + 0s 2 + 7s 3 + o(s 3 )) 3 + o(s 3 ) 
= s. All.9 



Develop the cube and identify the coefficients of like powers to find 


a = 1 , 0 = 0 



which is the desired result, proving part (b) of Lemma All.l. 


All. 10 
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Proof of Propositions 3*8.12 and 3.8. 13, continued. Inserting the value 
of X(s) given in Equation A 11.6 into Equation A 11.2 for the curve, we see that 
up to degree 3, the parametrization of our curve by arc length is given by 

X(s) = s - ^A^s 3 + o(« 3 ) 

^( 5 ) = 2^ 2 - “f'g^s^ + ofs 3 ) = -A2S 2 + -A 3 s 3 + o(s 3 ) 

Z(s) = Ib 3 s 3 + o(s 3 ). All. 11 

0 

Differentiating these functions gives us the velocity vector 

.2 


t(s) - 


A 2 S + ^A 3 S 2 + 0(« 2 ) 

^f-8 2 + 0(s 2 ) 


to degree 2, hence t(0) = 


1 

0 

0 


All. 12 


Now we want to compute n(s). We have: 


n(s) 


_ _ 1 


-A\s 4- o(s) 
A2 -F A 3 s 4- o(s ) 
B 3 s 4- o(s) 


It'WI It'Wi 

We need to evaluate |€ v (s)|: 

|t (s)| = 

— \J A\ 4- 2A2A3S 4- o(s). 

Therefore, 

~ (Al + 2A 2 A 3 s + o(s))~' /2 = ^A 2 (l + +o(s)j 

1 /. , 2A 3 s\-i/2 

~a;\ 1 + ^t) + °w- 


>111.13 


>411.14 


-1/2 


>411.15 


Again using the binomial theorem, 

1 _ J_ /, 1 (2A 3 \ \ 1 A 3 

it'wi ~a 2 { 1 ~ 5 (it*) + 0(s) j = m ~ ~m s +0{s) - 


All. 16 


So 


n W = 


(^2 “ a| s ) ( _i4 2») + °W) 
(>£ " a| s ) + A * s ) + °W) 


-A2S 4- o(s) 

1 4- o(s) 

L f? s+ °w J 


All. 17 
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Figure A 12.1. 


The sum log 1 +log 2H hlog n 

is a midpoint Riemann sum for the 
integral 


Hence 



‘0 

1 

0 

L 


to degree 1, and 


b(0) = t(0) x ri(0) = 


0 

0 

1 


Moreover, 



= -rtt(0) + rb(0), 


and t'(0) = «it(0). 


All. 18 


All. 19 


Now all that remains is to prove that b'(0) = — ril(0), i.e., b'(0) = — ^ 
Ignoring higher degree terms, 


0 

1 

0 


/*n+ 

J 1/2 


n+1/2 


log xdx. 


The Arth rectangle has the same 
area as the trapezoid whose top 
edge is tangent to the graph of 
logx at logn, as illustrated when 
fc = 2. 


So 


b(s) = t($) x n(s) « 

' 1 ‘ 
A 2 S 

X 

m -A 2 s' 

1 

— 

r 0 1 


0 


L£*J 


1 


0 



0 


□ 


All. 20 


A11.21 


A. 12 Proof of the Central Limit Theorem 



To explain why the central limit theorem is true, we will need to understand 
how the factorial n! behaves as n becomes large. How big is 100! ? How many 
digits does it have? Stirling ’s formula gives a very useful approximation. 

Proposition A12.1 (Stirling’s formula). The number n! is approxi- 
mately 

n! « y/2x y/n t 


The difference between the ar- 
eas of the trapezoids and the area 
under the graph of the logarithm 
is the shaded region. It has finite 
total area, as shown in Equation 
A12.2. 


in the sense that the ratio of the two sides tends to l as n tends to oo. 

For instance, 

v^(100/e) 10o v/i00« 9.3248 • 10 157 and 100! « 9.3326 • 10 157 , A12.1 


for a ratio of about 1.0008. 

Proof. Define the number Rn by the formula 
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The second equality in Equa- 
tion A 12.3 comes from setting x = 
n 4- t and writing 


x = n 4- t = n 



so 

logx = log(n(l + </n)) 

= logn 4- log( 1 4- t/n) 


and 



ndt — logn. 


The next is justified by 

1/2 t 

-dt = 0. 

- 1/2 n 


I 


log n! = log 1 4- log 2 4- • • • 4- log n = 

' v 

midpoint Riemann sum 


/*n4 

n = / 

J 1/2 


n+1/2 


log xdx 4- R n . 


412.2 


(As illustrated by Figures 12.1 and 12.2, the left-hand side is a midpoint Rie- 
mann sum.) This formula is justified by the following computation, which shows 
that the R n form a convergent sequence: 



fR+ 1/2 


f 1/2 ( t\ 

1 Rn ~ Rn - 1 1 = 

log n - 1 log x dx 

— 

/ log ( 1 4 - - ) dt 


Jn-l/2 


7-1/2 V nj 


CM- 

-) - -) dt 
n J n ) 

K er - 

1 

“ 6n 2 * 


412.3 


so the series formed by the Rn — Rn-\ is convergent, and the sequence converges 
to some limit R. Thus we can rewrite Equation A12.2 as follows: 


The last is Taylor’s theorem with 
remainder: 

iog (1 +/ t ) = /, 2 

for some c with jcj < |/»|; in our 
case, h = t/n with t 6 (-1/2, 1/2] 
and C = —1/2 is the worst value. 


Equation A 12.5 comes from: 
log(n 4- ^) = log(n(l + 

= log n 4- log( 1 -f ^-) 

= log „ + ^ +0 (±). 


/ n-t-i/z 

logxdx + i? + Cl (n) = [x\ogx - x}*^ 12 + R + t\(n) 

= (( n + *) log ( n + \) - ( n + 5)) - (5 log \ ~ 5) + R + «*("). 

where €i(n) tends to 0 as n tends to 00 . Now notice that 


412.4 


(n 4- - j log ( n + 4- ^ logn 4- ^ 4 - c 2 (n), 412.5 

where e 2 (n) includes all the terms that tend to 0 as n -* 00 . Putting all this 
together, we see that there is a constant 


such that 



412.6 


The epsilons €1 (n) and € 2 ( 71 ) 
are unrelated, but both go to 0 as 
n —* 00 , as does e(n) = €i(n) 4- 
€2 (n). 


log n! = n log n 4- ^ log n - n 4- c 4- e(n), 412.7 

where c(n) -+ 0 as n -> 00 . Exponentiating this gives exactly Stirling’s formula, 
except for the determination of the constant C : 

n\ = Cn n e~ n y/n e^ , 412.8 

— * 1 as n—»oo 

where C = e c . There isn’t any obvious reason why it should be possible to 
evaluate C exactly, but it turns out that C — \/2nr, we will derive this at the 
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end of this subsection, using a result from Section 4.11 and a result developed 
below. Another way of deriving this is presented in Exercise 12.1. □ 


Proving the central limit theorem 

We now prove the following version of the central limit theorem. 


Theorem A 12.2. If a fair coin is tossed 2 n times, the probability that the 
number of heads is between n 4- a y/n and n 4- by/n tends to 

»b 


— f 

\Z* J a 


.-f 


dt 


A12.9 


as n tends to oo. 


Proof. The probability of having between n 4- dy/n and n + by/n heads is 
exactly 


1 

2 2 " 


by/n 

z u 

k=ay/n 


2 n 
+ 


k) = 2** ^ 


(2n)! 


k=ay/n 


(n 4- A:)!(n - A:)! 


A12.10 


The idea is to rewrite the sum on the right, using Stirling’s formula, cancel 
everything we can, and see that what is left is a Riemann sum for the integral 
in Equation A12.9 (more precisely, l/y/ir times that Riemann sum). 

Let us begin by writing k = ty/n , so that the sum is over those values of t 
between a and b such that ty/n is an integer; we will denote this set by Tj Q 6). 
These points are regularly spaced, 1 /y/n apart, between a and 6, and hence are 
good candidates for the points at which to evaluate a function when forming a 
Riemann sum. With this notation, our sum becomes 


(2n)! 

(n + ty/n)\(n - ty/n)! 


12.11 


0.6) 


C(2n) 2n e~ 2w y/2n 

( C{n + ty/nY n ^ t ^e~i n+t ^ y/n+ (c{n - ty/r^^-^e-^-^) y/n - ty/n)) 


Now for some of the cancellations: (2n) 2n = 2 2n n 2n , and the powers of 2 
cancel with the fraction in front of the sum. Also, all the exponential terms 
cancel, since e"( n+tv ^e~ (n ~ tv/ ”) — e“ 2n . Also, one power of C cancels. This 
leaves 


1 n 2n y/2n 

C te^6] '/ n2 - t2n ( n + ty/n/ n+t ^(n - ty/n/ n ~ t ^ ’ 


A12.12 


Next, write (n 4- t v /n) (n+ * v/ ” ) = n (n+ ^(l + t/y/n) (n+t ^\ and similarly 
for the term in n - ty/n , note that the powers of n cancel with the n 2n in the 
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We denote by A t the spacing of 
the points i.e., 1 /y/n. 


In the third line of Equation 

A12.15, the denominator of the 

2 

first term tends to e" 1 a s n —* 
oo, by Equation A12.14. By the 
same equation, the numerator of 
the second term tends to e~ t , 
and the denominator of the second 
term tends to e} . 


numerator, to find 


1 y-v / 2n 

~c 




base 


t 2 n (J +t/ y /n)< n+t '/"H 1 - t/y/n)^- 1 ^ ' 

‘ ' 

height of rectangles for Riemann sum 


A12.13 


(1 -t- t/y/n) n ( 1 + t/y/n) i 'S*(\ - t/y/n) n ( 1 - t/y/n )-^ " 
1 


1 - Y/n)- (1 + j^) 1 ^ 




1 e ~ _ ^_ t 2 


e - * 2 e‘ 2 


Putting this together, we see that 

A E 


2n 


tGTlo .6) 

converges to 


n 2 - t 2 n (1 -f f/v/n) (n+tv ^Hl ~ tjy/n)^ 11 -^) 


V2 ± 

C y/n 


E 


which is the desired Riemann sum. Thus as n — ► oo, 

n+b ' / * s/2 ' b 


1 V (2n\ ^ v2 r 

2^r 2. \k)~cL dt 

J./1. /« 6l 


A12.14 


The term under the square root converges to y/2/n — >/2 A£, so it is the 
length of the base of the rectangles we need for our Riemann sum. For the 
other, remember that 

lim (l + ^) Z = e a . 

x — oo \ X / 

We use Equation A12.14 repeatedly in the following calculation: 

1 

(1 + </v/n)< n+ ^)( l - t/y/nY"- 1 ^ 

1 


A12.15 


AJ2.16 


A12.17 


A12.18 


k=n+as/n 

We finally need to invoke a fact justified in Section 4.11 (Equation 4.11.51): 


r 

J —oo 


e f * dt = y/ir. 


A12.19 


Now since when a — — oo and b = -f-oo we must have 

y/2 r- y/2 


V2 - y/2 _ t 2 

C V* c J e dt - 1, 


> 112.20 
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we see that C = V2n, and finally 


1 

2 2n 


n+by/n 

£ (?) 
/c = n 4 a s/n 



□ 


>412.21 


A. 13 Proof of Fubini’s Theorem 

Theorem 4.5.8 (Fubini’s theorem). let / be an integrate /unction on 
R" x R m , and suppose that for each x € R", the /unction y >-* f(x, y) is 
integrable. Then the /unction 

xh / /(x,y )|ci m y| 

is integrable , and 

/ /(x,y)|d n x||d m y| = f (f /(x,y)|<Ty| Wx|. 

Jln+m V7*m / 


In fact, we will prove a stronger theorem: it turns out that the assumption 
“that for each x € R n , the function y *-+ /(x, y) is integrable” is not really 
necessary. But we need to be careful; it is not quite true that just because / is 
integrable, the function y *-+ /(x, y ) is integrable, and we can’t simply remove 
that hypothesis. The following example illustrates the difficulty. 


By “rough statement” we mean 
Equation 4.5.1: 



Example A13.1 (A case where the rough statement of Fubini’s theo- 
rem does not work). Consider the function / ( y ) that equals 0 outside the 
unit square, and 1 both inside the square and on its boundary, except for the 
boundary where x = 1. On that boundary, / = 1 when y is rational, and f — 0 
when y is irrational: 

1 if 0 < x < 1 and 0 < y < 1 

1 if x = 1 and y is rational A13.1 

0 otherwise. 

Following the procedure we used in Section 4.5, we write the double integral 

JJf{ X y) d * d V = l'(i 0 ' f ( X v) dy ) dX - M3 - 2 



However, the inner integral / ( * ) dy does not make sense. Our function / 
is integrable on 1R 2 , but / (y) is n °t an integrable function of y. A 
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In fact, the function F could be 
undefined on a much more compli- 
cated set than a single point, but 
this set will necessarily have vol- 
ume 0, so it doesn’t affect the in- 
tegral / R F(x) dx. 


For example, if we have an in- 


tegrate function / 


Xl 

X2 

y 


, we can 


think of it as a function on R 2 x R, 
where we consider xi and x? as the 
horizontal variables and y as the 
vertical variable. 



Here we imagine that the x 
and y variables are horizontal and 
the z variable is vertical. Fixing 
a value of the horizontal variable 
picks out a FYench fry, and choos- 
ing a value of the vertical variable 
chooses a flat potato chip. 


Fortunately, the fact that F(l) is not defined is not a serious problem: since 
a point has one-dimensional volume 0, you could define F(l) to be anything 
you want, without affecting the integral Jq F{x)dx. This always happens: if 
/ : R n+m -» R is integrable, then y /(x.y) is always integrable except for 
a set of x of volume 0, which doesn’t matter. We deal with this problem by 
using upper integrals and lower integrals for the inner integral. 

Suppose we have a function / : R n+m — ► R, and that x G R” denotes the 
first n variables of the domain and y G R m denotes the last m variables. We 
will think of the x variables as “horizontal” and the y variables as “vertical.” 
We denote by / x the restriction of / to the vertical subset where the horizontal 
coordinate is fixed to be x, and by / y the restriction of the function to horizontal 
subset where the vertical coordinate is fixed at y. With / x ( y) we hold the 
“horizontal” variables constant and look at the values of the vertical variables. 
You may imagine a bin filled with infinitely thin vertical sticks. At each point 
x there is a stick representing all the values of y. 

With / y we hold the “vertical” variables constant, and look at the values 
of the horizontal variables. Here we imagine the bin filled with infinitely thin 
sheets of paper; for each value of y there is a single sheet, representing the 
values of x. Either way, the entire bin is filled: 

/x(y) = / y (x) = /(x, y). -413.3 

Alternatively, as shown in Figure A13.1, we can imagine slicing a potato 
vertically into FYench fries, or horizontally into potato chips. 

As we saw in Example A 13.1, it is unfortunately not true that if / is inte- 
grable, then / x and / y are also integrable for every x and y. But the following 
is true: 


Theorem A13.2 (Fubini’s theorem). Let f be an integrable function on 
R n x R m . Then the four functions 

L(/ x ), U(fy) t L<f) A13.4 

are all integrable, and 

adding upper sums for all «nlnmn« adding lower earns for all columns 

/ u(/x)l<rxj = f Hf x ) |d"x| 

adding upper sums for all rows adding lower sums for all rows 

7 — * . 7 — * • 

/ u(n\<ry\ = / L(n\<r y| ^ 13 - 5 

y***» 


integral of / 


' I 


f\<rx\\<r y \. 


l n xl w 
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Of course, the same idea holds 
in R 3 : integrating over all the 
French fries and adding them up 
gives the same result as integrat- 
ing over ail the potato chips and 
adding them. 

Equation A13.7: The first line 
is just the definition of an upper 
sum. 

To go from the first to the sec- 
ond line, note that the decomposi- 
tion of R n x IR m into Ci x C 2 with 
Ci € IMIR n ) and C 2 € 
is finer than V N {U n+m ). 

For the third line, consider 
what we we doing: for each Ci € 
T>n (R n ) we choose a point x € Ci , 
and for each C 2 € 2>*/(R m ) we 
find the y € C 2 such that /(x, y) 
is maximal, and add these max- 
ima. These maxima are restricted 
to all have the same x-coordinate, 
so they are at most Mc t xc 2 / t and 
even if we now maximize over all 
x € Ci , we will still find less than 
if we had added the maxima inde- 
pendently; equality will occur only 
if all the maxima we above each 
other (i.e., all have the same x- 
coordinate). 


Corollary A13.3. The set of x such that U(f x ) ^ L(/ x ) has volume 0. 
The set of y such that U{P) =£ L(/ y ) has volume 0. 

In particular, the set of x such that f x is not integrable has n-dimensional 
volume 0, and similarly, the set of y where f y is not integrahle has Tri- 
dimensional volume 0. 

Proof of Corollary A 13. 3. If these volumes were not 0, the first and third 
equalities of Equation A13.5 would not be true. □ 

Proof of Theorem A13.2. The underlying idea is straightforward. Consider 
a double integral over some bounded domain in R 2 . For every N, we have to sum 
over all the squares of some dyadic decomposition of the plane. These squares 
can be taken in any order, since only finitely many contribute a nonzero term 
(because the domain is bounded). Adding together the entries of each column 
and then adding the totals is like integrating / x ; adding together the entries of 
each row and then adding the totals together is like integrating f y , as illustrated 
in Figure A13.2. 


1 5 

2 6 

3 7 

±4 +8 

10 + 26 


1 + 5 

2 + 6 

gives the same result as 3 + 7 

4 + 8 


FrGURE A 13. 2. Tb the left, we sum entries of each column and add the totals; 
this is like integrating / x . To the right, we sum entries of each row and add the 
totals; this is like integrating / y . 


Putting this in practice requires a little attention to limits. The inequality 
that makes things work is that for any N' > N, we have (Lemma 4.1.7) 

UnU) > M*W/x))- A13.6 

Indeed, 

VnU) = X! M c(f) vol „ +m C 

C€Pjv(B n xR"») 

- ^2 Mc ' * <?*(/) v °l„ vol m C 2 

c l €P*(i")c 2 €P, V /(i m ) A13.7 

> Mci j ^c 2 (/x)vol m C 2 |vol n Ci. 

Ci€P/v(R n ) \C 2 €V N >(Z m ) 1 
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An analogous argument about lower sums gives 


In Equations A13.8 and A13.9, 
expressions like £/at(Ujv'(/ x )) and 
Ls{L(fx)) may seem strange, but 
note that Us'ifx) and L(/ x ) are 
just functions of x, bounded with 
bounded support, so we can take 
Nth upper or lower sums of them. 


U N {f) > U N (U N .(f x )) > L N (L N >(f x )) > Lfsj(f). A13.8 

Since / is integrable, we can make Uu(f) and Ls(f) arbitrarily close, by 
choosing N sufficiently large; we can squeeze the two ends of Equation A 13.8 
together, squeezing everything inside in the process. This is what we are going 
to do. 

The limits as N' — ► oo of Uh>(/ x ) and L^>(f x ) are the upper and lower 
integrals U(f x ) and L(f x ) (by Definition 4.1.9), so we can rewrite Equation 
A13.8: 


UnU) > U N (U(f x )) > L N {L(f x )) > L N {f). A13.9 


Given a function /, U(f) > L(f)\ in addition, if / > g, then Uj^(f) > Ufir(g). 
So we see that Uj^(L(f x )) and L^{U(f x )) are between the inner values of 
Equation A 13.9: 


We don’t know which is big- 
ger, Un(L(/ x )) or Ln(U(/ x )), but 
that doesn’t matter. We know 
they are between the first and last 
terms of Equation A13.9, which 
themselves have a common limit 
as N — * oo. 


U N (U(f x )) > U N (L(f x )) > L N (L(f x )) 

U N (U(f x )) > L N (U(f x )) > L N (L(f x )). 1310 

So Un(L(f x )) and L N (L(f x )) have a common limit, as do Uf/(U(f x )) and 
Ln(U(/ x )), showing that both L(f x ) and U(f x ) are integrable, and their inte- 
grals are equal, since they are both equal to 


/' 

B n xE m 


A13.ll 


The argument about the functions f y is similar. □ 


A. 14 Justifying the Use of Other Pavings 

Here we prove Theorem, which says that we are not restricted to dyadic pavings 
when computing integrals. 


Theorem 4.7.5. Let X c R" be a bounded subset, and V N be a nested 
partition of X. If the boundary dX satisfies vol n (&X') = 0, and / : JR" -» R 
is integrable, then the limits 

.} in» P»„(/) and Jim°Lp„(f) 4 . 7.4 

both exist, and are equal to 

f /(*)|<Tx|. 

Jx 


4.7.5 
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Proof. Since the boundary of X has volume 0, the characteristic function Xx 
is integrable, and we may replace / by X*/, and suppose that the support of 
/ is in X. We need to prove that for any e, we can find M such that 

Uv u U)~Lv» (/)<«• 414.1 


Why the 8 in the denomina- 
tor? Because it will give us the re- 
sult we want; the ends justify the 
means. 


This is where we use the fact 
that the diameters of the tiles go 
to 0. 


Since we know that the analogous statement for dyadic pavings is true, the idea 
of the proof is to use “other pavings” small enough so that each paving piece 
P will either be entirely inside a dyadic cube, or (if it touches or intersects 
a boundary between dyadic cubes) will contribute a negligible amount to the 
upper and lower sums. 

First, using the fact that / is integrable, find N such that the difference 
between upper and lower sums of dyadic decompositions is less than (./ 2: 

UnU) -LnU)<\- .414.2 

Next, find N' > N such that if L is the union of the cubes C € V N < whose 
closures intersect then the contribution of vol n L to the integral of / is 
negligible. We do this by finding N ' such that 


vol„ L < 


e 

8 sup |/|' 


A14.3 


Now, find N" such that every P € Pyv" either is entirely contained in L, or is 
entirely contained in some C € £>/v, or both. 

We claim that this N” works, in the sense that 

Uv N „ (/) - L Vn „ (/) < e, A14.4 

but it takes a bit of doing to prove it. 

Every x is contained in some dyadic cube C. Let CW(x) be the cube at 
level N that contains x. Now define the function / that assigns to each x the 
maximum of the function over its cube: 


/(x) = Af c „ (x) (/). A14.5 

Similarly, every x is in some paving tile P. Let P M {x) be the paving tile at 
level M that contains x, and define the function g that assigns to each x the 
maximum of the function over its paving tile P if P is entirely within a dyadic 

cube at level N , and minus the sup of / if P intersects the boundary of a dyadic 
cube: 


£(x) = | Mp N"( x )^ if p N"(x)ndV N = $ 

l -sup|/| otherwise. 

Then g < /; hence 

f g\<Tx\ < f f\(Tx\ = U N {J). 

J I n J I" 

Now we compute the upper sum U Vs , t (f), as follows: 


A14.6 


A14.7 
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On the far right of the sec- 
ond line of Equation A 14.8 we add 
- sup |/| 4- sup |/| = 0 to Mp{f). 


up N „ (/) = Mp W voln p 


.414.8 


cancels out 

= ^ M P (f)vo\ n P+ Y, (M P (/) - sup |/| +sup|/|) v °l„ P. 

pev a ,". Pev M », 

pnav N = <t> pndv N # <b 

y.,-- v — — * v — „ — * 

contribution from P contribution from P that intersect 

entirely in dyadic cubes the boundary of dyadic cubes 


Now we make two sums out of the single sum on the far right: 


(- sup |/|) vol n P+ ^2 (Mp(f) -f- sup |/|) vol n P, A14.9 

PG'Pn"' P€Pn"> 

pndv N ? 0 pr\&v N ? 0 

and add the first sum to the sum giving the contribution from P entirely in 
dyadic cubes, to get the integral of g : 


Since Mp(f ) is the least upper 
bound just over P while sup|/| 
is least upper bound over all of 
R n , we have Mp(f) + sup|/| < 
2sup|/|. 


M P (f) vol n P+ ^2 (- sup I/I) voln P = f <j|cTx|. Al4. 

ZT> ... t>c'D 


10 


Pev N „ s 
pndV N = 0 


Pev N „, 
pndT>N / 0 


We can rewrite Equation A14.8 as: 


<2 sup |/| (see note in margin) 


Uv N n{f) = f 9\<P*\+ y2 (A//>(/) sup |/|) voln P. Al4.ll 

Jmn p 

Pndv N # 0 

Using Equation A 14.3 to give an upper bound on the volume of the paving 
pieces P that intersect the boundary, we get 

Uv N „(f)- f S|d"x| 

Equation A 14. 7 then gives us 

<v N (f) 


< 2 sup |/| vol n L < 2 sup |/| 


8 sup |/| 


A14.12 


f s 

Uv N „ (!) - f 5|<Tx| 
Jm n 


<\> so U VN „(f)<U N (f)+ e -. 


14.1 3 


An exactly analogous argument leads to 

Lv N u(f) > L n {J) ~ l - Lv n „(}) < —Lff(f) + A14.14 

Adding these together and using Equation A14.2, we get 
Vv N „ (/) - l V n „ (/) < U N (f) - LnU) + \ < f- □ 


414.15 
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A. 15 Existence and Uniqueness of the Determinant 


This is a messy and uninspir- 
ing exercise in the use of induc- 
tion; students willing to accept the 
theorem on faith may wish to skip 
the proof, or save it for a rainy day. 

The three n x n matrices of 
Equation A 15.1 are identical ex- 
cept for the Art h column, which are 
respectively a*, b and c. 


We can see from Equation 4.8.9 
that exchanging the columns of a 
2x2 matrix changes the sign of D. 

We can restrict ourselves to k — 
2 because if k > 2, say, k = 5, 
then we could switch j = 1 and 
k with a total of three exchanges: 
one to exchange the Arth and the 
second position, one to exchange 
positions 1 and 2, and a third 
to exchange position 2 and the 
fifth position again. By our argu- 
ment above we know that the first 
and third exchanges would each 
change the sign of the determi- 
nant, resulting in no net change; 
the only exchange that “counts” is 
the change of the first and second 
positions. 

In the next-to-last line of Equa- 
tion A15.4, A t , i = B t , j = C tA be- 
cause the matrices A, B , and C are 
identical except for the first col- 
umn, which is erased to produce 
Ai.\,B x ,u and C } ,\. 


Theorem 4.8.4 (Existence and uniqueness of determinants). There 
exists a function det A satisfying the three properties of the determinant, 
and it is unique. 


Uniqueness is proved in Section 4.8; here we prove existence. We will verify 
that the function D{A), the development along the first column, does indeed 
satisfy properties (1), (2), and (3) for the determinant det A. 

(1) Multilinearity 

Let b,c 6 IR”, and suppose a* = (3b -f 7c. Set 

A [aj , . . . , a fc , . . . , 8^] , 

B [aj , . . . , b, . . . , a n ], A 15. 1 

C [aj , . . . , c, . . . , a„]. 

The object is to show that 

D(A) = 0D(B) + ~fD(C), A15.2 

We need to distinguish two cases: k = 1 (i.e., k is the first column) and k > 1. 

The case k > 1 is proved by induction. Clearly multilinearity is true for D's 
of 1 x 1 matrices, which are just numbers. We will suppose multilinearity is 
true for D’s of (n — 1) x (n — 1) matrices, such as Ai t \. Just write: 

n 

D{A) = ^(-l) 1+l (Equation 4.8.9) 

i — 1 
n 

= y^(-l) 1 " <*». 1 (0D (Bj.i) + yD {Ci,i)) (Inductive assumption) 

t=i 
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= /3^(- 1 ) I+i a i , 1 D(B ( ,,) + 7 ^(-l) 1 +‘a 1 ,,D(C i ,i) 

*-l 1=1 

= 0D{B) + 7D(C). 

This proves the case k > 1. Now for the case k - 1: 


A15.3 


n 


0 (A) = £(-!)>+'«,, D(A U ) = £(-l) I+ * +7c,,i) 


t = l 


t=l 


= a*. j by definition 


= 0]D- 1 > l+ '\i0 (Am) + 7^(— 1) 1+, C U D (Au) 

= (B..i) 1-1 =<cC> 


i = l 


= &D(B) + 1 D(C). 


A15.4 
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The matrix A Ls (In* same as 
A with the jth and Aih eolumns 
exchanged. 


I 2 
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Ficijuk A 15.1. 


The black square is in the jth 
row of the matrix (in this case the 
6th). But after removing the first 
column and zth row, it is in the 5th 
row of the new matrix. So when 
we exchange the first and sec- 
ond columns, the determinant of 
the unshaded matrix will multiply 
Ot.KU .2 in both cases, but will con- 
tribute to the determinant with 
opposite sign. 


This proves multilinearity of our function D. 

(2) Antisymmetry 

We want to prove D(A ) = — D(A)* where A is formed by exchanging the jth 
atid kth columns of .4. 

Again, we have two cases to consider. The first , where* both j and A* are 
greater than 1. is proved by induction: we assume the function D is antisym- 

/■w 

metric for (w - 1) x (n — 1) matrices, so that in particular D(A\ ,) = -D(A j,,) 
for each i , and we will show that if so, it is true for n x n matrices. 

v n 

D(A) = ^(-l) ,+ ‘a i l D(/l,.,) ^ I )' +l n,.\ D(A\,i) 

,=l by ■=' A\T>. 5 

induction 

= -D(A). 


The case where either j or k equals 1 is more unpleasant. Let’s assume 
j — l.A: = 2. Our approach will he to go one level deeper into our recursive 
formula, expressing D(A) not just in terms of (n - 1 ) x (n - 1) matrices, but in 
terms of (n — 2) x (n — 2) matrices: the matrix formed by removing 

the first and second columns and the itl> and mth rows of A. 

In the second line of E(|iiatiou A15.6 below, the entire expression within big 
parentheses gives D{A lA ) } in terms of D (A urn . iU2 ): 

Tl 

0( 4 ) = £(-l) i+ '« ( .i0( 4 u) >415.0 


=E(-l) i+ln - {D-l) m+, «m.2£>(>4 i , m;1 , 2 )+ £ {-\) m a m , 2 D(A ,,. 3 )V 

* \m=l / 


lerms where m<» 


terms* where m>* 


— ^(*4j,j),in terms* of D of M,.,,,, j.a), cotiRidereil in two parta 


There are two suniS within the term in parentheses, because in going from the 
matrix A to the matrix A iA . the ith row was removed, as shown in Figure A 15.1. 
Then, iti creating from A, t j. we remove the mth row (atid the second 

column) of A. When we write we must thus remembei • that the ith row 

is missing , and hence a m . 2 is in the (m -1) row of A tA when m > i. We do that 
by summing separately, for each value of t, the terms with m from 1 to i - 1 
and those with m from i -f 1 to n, carefully using the sign ( — l) m-,+I = (_] 

for the second batch. (For i - 1, how many terms are there with in from 1 to 
i — 1? With m from < + I to n? 4 ) 


’As shown in Figure A 15. 1, there are three terms in the first sum. m = 1,2.3, and 
n — 4 terms in the second. 
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Exactly the same computation for A leads to 




A 15.7 


n 


f j - 1 


n 


=D- 1 ) >+1 M J}-ir +i a P ,2D(A ] , p . 1 , 2 )+ £ (-1 

j = 1 \p= 1 p=j+l / 

Let us look at one particular term of the double sum of Equation A15.7, 
corresponding to some j and p: 


{ 


(— 1) >+P Hia p , 2 D(A lp;li2 ) if p<j 

(_l)j+p+i ay,, a p ,2 D(-4y, p .i, 2 ) if p>j. 


>115.8 


Remember that i = a J>2 , a P , 2 = 1 , and >4 j iP; i >2 = A Pi;; i ( 2 . Thus we can 

rewrite Equation A 15. 8 as 

(-1) J+P flj,2 flp.i D(A p>>; i ( 2 ) if p < j 

(_ 1 )j+p+i a j>2 a Pil D(A pJ:1i2 ) if p > j. 

This is the term corresponding to i = p and m = j in Equation A15.6, but with 
the opposite sign. □ 


{ 


A15.9 


Let us illustrate this in a particular example. Focus on the 2 and the 8: 


D 


2 6 - - 
3 7-- 


"V" 

A 


D 


5 1 - - 1 

6 2 - - 
7 3 - — 

L8 4 - - 


=1 D 

'6 - - 

7 - - 

-2D 

*5 - - 
7 - - 

✓ 

8 - - 

'5 - - 


8 - - 

*5 - - 

+3 D 

6 - - 

-4 D 

6 - - 


8 - - 


7 - - 


to 

2 - — 


'1 - 

= 5 D 

3 - - 

-6D 

3 - - 


4 _ __ 


4 - 


1 - -* 


1 - 

+7 D 

2 - - 

-&D 

2 - - 


4 - - 


3 - - 


>415.10 


>415.11 


Expanding the second term on the right-hand side of Equation A15.10 gives 
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What about the other terms 
in Equations A15.12 and A15.13? 
Each term from the expansion of 
A corresponds to a term of the ex- 
pansion of A , identical but with 
opposite sign. For example, the 


term —2(5 D 


) of Equa- 


tion A15.12 corresponds to the 
first term gotten by expanding the 
first term on the right-hand side of 
Equation A15.ll. 


-2 


-7D 


+ 8D 


A15.12 


expanding the fourth term on the right-hand side of of Equation A15.ll gives 


-8 (lD 


-2D 


+ 3D 


)■ 


A15.13 


The first gives -16D 


, the second +16 D 

matrices here are identical, so the terms are identica 

Why does this happen? In the matrix A , the 8 in the second column is below 
the 2 in the first column, so when the second row (with the 2) is removed, the 8 


The two blank 
, with opposite signs. 


is in the third row, not the fourth. Therefore, 8 D 


comes with positive 


sign: (— 1) J+1 = (-1) 4 = +1. In the matrix A , the 2 in the second column is 
above the 8 in the first column, so when the fourth row (with the 8) is removed, 


the 2 is still in the second row. Therefore, 2 D 


comes with negative 


sign: (-1)* +1 = (-1) 3 = -1. 

We chose our 2 and 8 arbitrarily, so the same argument is true for any pair 
consisting of one entry from the first column and one from the second. (What 
would happen if we chose two entries from the same row, e.g., the 2 and 6 
above? 5 What happens if the first two columns are identical? 6 ) 


(3) Normalization 

The normalization condition is much simpler. If A = (ei, . . . ,e n ], then in 
the first column, only the first entry a^i = 1 is nonzero, and Ai yl is the identity 
matrix one size smaller, so that D of it is 1 by induction. So 


D(A) = a^D^O = 1, A15.14 

and we have also proved property (3). This completes the proof of existence; 
uniqueness is proved in Section 4.8. □ 


A. 16 Rigorous Proof of the Change of Variables 
Formula 


Here we prove the change of variables formula, Theorem 4.10.12. The proof is 
just a (lengthy) matter of dotting the i’s of the sketch in Section 4.10. 

®This is impossible, since when we go one level deeper, that row is erased. 

The determinant is 0, since each term has a term that is identical to it but with 
opposite sign. 



636 Appendix A: Some Harder Proofs 


Theorem 4.10.12 (Change of variables formula). Let X be a compact 
subset ofR" with boundary dX of volume 0, and U an open neighborhood of 
X. Let $ : U — ♦ R n be a C 1 mapping with Lipschitz derivative, that is one to 
one on ( X-dX ), and such that (D$(x)] is invertible at every x 6 (X-dX). 

Set Y = $(*)• 

Then iff :Y -+R is integrable , then (/ o $) | det [D$] | is integrable on X , 
and 

J /(v)|(f"v| = (/o*)(u)|det[D*(u)H |<Tu|. 


The second line of Equation 
A16.1 is a Riemann sum; x<? is the 
point in C where $ is evaluated: 
midpoint, lower left-hand corner, 
or some other choice. 


Proof. As shown in Figure A 16.1, we will use the dyadic decomposition of X , 
and the image decomposition for Y, whose paving blocks are the $(CflX), C 6 
V N (R n ). We will call this partition $(V N (X)). The outline of the proof is as 
follows: 


sup / over curvy cube 
times vol. of curvy cube 


l 


f \<r* 


£ 

cev N ( E") 


M#(C)f v °ln $(C) 


« Y, Mc(f ° $)(vol„ C\ det[D$(xc)]|) 
cev N (U n ) 

w f (/o$)x|det[D$(x)]|) IcTxl, 

Jx 


A16.1 


where the xc in the second line is some x in C. The ~ become equalities in 
the limit. 


A cube C € 2>;y(IR n ) has side- 
length 1/2 n , and (see Exercise 
4.1.5) the distance between two 
points x, y in the same cube C is 

|x-y| < 

So the maximum distance between 
two points of y?(C) is i.e., 

V?(C) is contained in the box C' 
centered at <p(zc) with side-length 
Ky/n/2 N . 


(1) To justify the first we need to show that the image decomposition of 
Y , $(Vn(X)), is a nested partition. 

(2) To justify the second (this is the hard part) we need to show that 
as AT — ► oo, the volume of a curvy cube of the image decomposition 
equals the volume of a cube of the original dyadic decomposition times 
|det(D$(x c )]i. 

(3) The third is simply the definition of the integral as the limit of a Riemann 
sum. 


We need Proposition A16.1 (which is of interest in its own right) for (1): to 
show that <p(Vn(X)) is a nested partition. It will also be used at the end of 
the proof of the Change of Variables Formula. 
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U ’ 



Proposition A16.1 (Volume of the image by a C l map ). Let Z c R n 
be a compact payable subset of M n , U an open neighborhood of Z and $ : 
U -> R n a C 1 mapping with bounded derivative. Set K = sup x€ e/ |[D$(x)]|. 

Then 

vol n $(Z) < (Ky/n) n vo\ n Z. 

In particuiar, if vol n Z = 0, then vol n $(Z) = 0. 

Proof. Choose e > 0 and N > 0 so large that 

A= IJ CCU and vol n A < vol„ Z + c. 416.2 

cet> N ( a n ), 
cnz^<b 





(Recall that C denotes the closure of C.) Let z c be the center of one of the 
cubes C above. Then by Corollary 1.9.2, when z € C we have 

|$(zc) - *(z)| < K\zc - z|. 416.3 

(The distance between the two points in the image is at most K times the 
distance between the corresponding points of the domain.) Therefore $(C) is 
contained in the box C' centered at $(z c) with side-length Ky/n/2 N . 

Finally, 

$(Z) c U c\ .416.4 

cev N ( m n ), 
cnz?<t) 


Figure A16.1. 

The C 1 mapping $ maps X to 
Y. We will use in the proof the 
fact that <t> is defined on £/, not 
just on X. 


vol n <f(Z) < £ vol n C' = 


Cr\Z±<b 


( 2 ^)” 


y: voi„ c 

C€ V N (* n ), 


ratio voln C* to vol n C 

= (Ky/n) n vol n 4 < (K\/n) n (vo\ n Z + 1 ). □ 


416.5 


Corollary A16.2. The partition ip(T> N (X)) is a nested partition ofY. 


Proof. The three conditions to be verified are that the pieces are nested, that 
the diameters tend to 0 as N tends to infinity, and that the boundaries of the 
pieces have volume 0. The first is clear: if C\ C C 2 , then <p(Ci) c y?(C 2 ). The 
second is the same as Equation A16.3, and the third follows from the second 
part of Proposition A16.1. □ 


Our next proposition contains the real substance of the change of variables 
theorem. It says exactly why we can replace the volume of the little curvy 
parallelogram $(C) by its approximate volume | det[D$(x)]| vol n C. 
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Proposition A 16.4 is the main 
tool for proving Theorem 4.10.12. 
It says that for a change of vari- 
ables mapping the image $>(C) 
of a cube C centered at 0 is arbi- 
trarily close to the image of C by 
the derivative of at 0, as shown 
in Figures A16.2 and A16.3. 

Recall that “bijective” means 
one to one and onto. 


Why does Equation A16.9 
prove the right-hand inclusion? 
We want to know that if x € C, 
then 

*(x)€ (l + c)[D$(0))(C), 
or equivalently, 

(D$(0)J -1 $>(x) € (1 + €)(?. 
Since 

iD<t(0)J-‘$(x) = x+ 

[D*(0)]-‘ ($>(x) - (D4>(0)](x)) , 

|D$(0)]~‘$(x) is distance 

|[D*(0))-‘ ($(x) - [D$(0)](x)) | 

from x. But the ball of radius 
e| x\/y/n around any point x € C is 
completely contained in (1 + c)C, 
by Lemma A16.3 


Every time yon want to compare balls and cubes in K n , there is a pesky yfn 
which complicates the formulas. We will need to do this several times in the 
proof of A 16.4, and the following lemma isolates what we need. 


Lemma A16.3. Choose 0 < a < b, and let C a and Cb be the cubes centered 
at the origin of side length 2 a and 2b respectively, i.e., the cubes defined by 
|.t, | < a (respectively \xi\ < b), i = l,. . . ,n. Then the ball of radius 


(t - a)M 

dy/n. 


around any point of C a is contained in Cb- 


>416.6 


Proof. First note that if x € C a , then |x| < ay/n. Let x + h be a point of the 
ball. Then 


I'M < |h| < < - fc -~ a) ^ =b~a. 

ay/n ay/n 

Thus | Xi + fr t | < |x t | -f |fr t | <a + b- a — b. □ 


>416.7 


Proposition A16.4. Let U, V be open subsets in R n with 0 € U and 0 € V. 
Let $ : U -* V be a differentiable mapping with $(0) = 0. Suppose that $ is 
bijective , [D$] is Lipschitz, and that :V -*U is also differentiable with 
Lipschitz derivative. Let M be a Lipschitz constant for [D$] and [D$] _1 . 
Then 

(a) For any e > 0, there exists S > 0 such that if C is a cube centered at 
0 of side < 26, then 

(l~e)[D*(0 )]C c *(C) C (1 + e) [D$(0))C. 416.8 

squeezed between 
right and left sides 
as t — »0 

(b) We can choose 6 to depend only on e, |[D$(0)]|, |[I>*(0))~ 1 |, and the 
Lipschitz constant M, but no other information about 


Proof. The right-hand and the left-hand inclusions of Equation A16.8 require 
slightly different treatments. They are both consequences of Proposition A2.1, 
and you should remember that the largest n-dimensional cube contained in a 
ball of radius r has side-length 2 r/ yfn. 

The right-hand inclusion, illustrated by Figure A16.2, is gotten by finding a 
6 such that if the side-length of C is less than 26, and x € C, then 

|[D*(0)]-* ($(x) - (D4>(0))(x))| < 


416.9 
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FIGURE A16.2. The cube C is mapped to $(C), which is almost [D$(0)](C), and 
definitely inside (1 + e)(D<l>(0)](C). As c 0, the image $(C) becomes more and 
more exactly the parallelepiped [D$(0)]C. 


According to Proposition A2.1, 

|[D*(0)|-‘ (*(*) - [D*(0)l(x))| < 
so it is enough to require that when x € C, 

|[D»(0)]-‘|Af|x| 2 , e|x| . , < 2e 

2 " y/n' 1 IS VS|[D$(0)]-'Mr 


A16.10 


A16.11 


Again, it isn’t immediately ob- 
vious why the left-hand inclusion 
of Proposition A16.8 follows from 
the inequality A16.13. We need to 
show that if x € (1 - e)C , then 
(D$(0)]x € $>(C). 

Apply $ _1 to both sides to get 
$ -1 ([D$(0)]x) € C . Inequality 
A16.13 asserts that 

1 ( (D$(0))x) is within 

VH(f^j |x| ofx ' 

but the ball of that radius around 
any point of (1 - e)C is contained 
in C, again by Lemma A 16.3. 


Since x e C and C has side-length 26, we have |x| < 6y/n , so the right-hand 
inclusion will be satisfied if 



2c 

A/n|[D$(0)] _1 ‘ 


A16.12 


For the left-hand inclusion, illustrated by Figure A 16.3, we need to find 6 
such that when C has side-length < 26, then 




%/"(!-<) 


X 


A16.13 


when x € (1 - e)C. Again this follows from A2.1. Set y = [D$(0)]x. Then we 
find 


# -1 ([D*(0)]x) - x| = |4>-*y - |D* -1 (0)]y| 

< fly | 2 < y|[D*(0))x| 2 < y|[D4>(0))| 2 |x| 2 . 

A16.14 

Our inequality will be satisfied if 


M 

t I[D*(0)]| 2 |x| j 
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i 



0+e)C 


Remember that x € (1 — e)C, so |x| < (1 - e)6y/n, and the left-hand inclusion 
is satisfied if we take 

S = 5. >416.16 

(1 - e) 2 n|[D$(0)]| 2 

Choose the smaller of the two deltas. □ 

Proof of the change of variables formula, continued 

Proposition A 16. 4 goes a long way to towards proving the change of variables 
formula; still, the integral is defined in terms of upper and lower sums, and we 
must translate the statement into that language. 




-i 


7 /<l-e)lD*(0»<C)/ 

h / 1 7 — 

'// ; !'■ 

>li \ /: v (EHIHOJKC) 

// _... • ' • •••.. 

/ • (1+£)|D<I>(0)J(C) 


Proposition A16.5. Let U and V be bounded subsets in R n and let 
$ : U — ► V be a differentiable mapping with Upschitz derivative , that is 
bijective , and such that 1 : V — * U is also differentiable with Lipschitz 
derivative. 

Then for any 77 > 0, there exists N such that if C € £7v(R n ) and C CU, 
then, 


(1 - 77) A/c(|det[D$]|) volC < vol$(C) 


>116.17 


Figure A 16.3. 

The parallelepiped 

(l-e)(D*(0))(C) 

is mapped by $> _1 almost, to 


< (1 + rj) m c (\ det[D$]|) vol C. 

Proof of Proposition A16.5. Choose 77 > 0, and find e > 0 so that 

(l+e) n+1 < 1+77 and (1 - e) n+1 > 1 - 77. 

For this e, find jVj such that Proposition A 16.4 is true for every cube C € 
Z>Ar,(R n ) such that C C U. 

Next find N 2 such that for every cube C € £V 2 (R n ) with C C U, we have 


(1 - e)C, and definitely inside C. A/<?l det[D$][ ^ ^ mc|det[D$]| 

Therefore, the image of C covers 77ie| det(D$j| < + € ^ A/c|det[D$H > 1 “ c * >116.18 

(l-e)[D*(0))(C). 

Actually the second inequality follows from the first, since 1/(1 + e) > 1 - e. 


If iV is the larger of Ni and N 2y together these give 

volfj *(C) < (1 + c) n |det[D$(0)J|, 

and we get 

vol„ $(C) < (1 + f)" +1 m c (|det[D4>|). 
An exactly similar argument leads to 


.416.19 


416.20 


vol„*(C) > (1 — c) n+1 M c |det|D$]|. D 


416.21 
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We can now prove the change of variables theorem. First, we may assume 
that the function / to be integrated is positive. Call M the Lipschitz constant 
of [DO], and set 

K = sup |[DO(x)]| and L = sup |/(x)|. >416.22 

x€X x€X 

Choose 77 > 0. First choose N\ sufficiently large that the union of the cubes 
C € 'Dw l (X) whose closures intersect the boundary of X have total volume 
< 77 . We will denote by Z the union of these cubes; it is a thickening of dX , 
the boundary of X. 

Lemma A 16.6. The closure of X - Z is compact, and contains no poinf of 
dX. 

Proof. For the first part, X is bounded, so X — Z is bounded, so its closure is 
closed and bounded. For the second, notice that for every point a € M n , there 
is an r > 0 such that the ball £ r (a) is contained in the union of the cubes of 
IV, (R n ) with a in their closure. So no sequence in X - Z can converge to a 
point a € dX . □ 


Recall that if M is the Lipschitz 
constant of [DO], then 

|[DO(x)] - [DO(y)]| < M\x - y| 
for all x,y e U. 

Since K ' == sup|[DO(x))| -1 , it 
is also sup|[DO(y)j| -1 , account- 
ing for the K 2 in the second line 
of Equation A16.23. 


In particular, [DO] 1 is bounded on X-Z, say by K\ and it is also Lipschitz. 
This is seen by writing 


<M|y— x| 

|(DO(x )] _1 - [DO(y)]- 1 ] = |[DO(x )] _1 ([DO(y)] - [D*(x)])[D*(y)l- 1 | 

< (K') 2 M\x — yj. A16.23 


So we can choose iV 2 > N\ so that Proposition A16.5 is true for all cubes in 
V contained in X - Z. We will call the cubes of in Z boundary cubes , 
and the others interior cubes. 

Then we have 


^((/o*)|dct[D*]|) = Y Mc((f o $)| det[D<£]|) vol n C 
= Y, Mc(({ ° l t , )|det[D<t>]|) vol„ C 

interior cubes C 

+ Y Afc((/ o $)l det[D$]|) vol„ C 

boundary cubes C 

< ^2 ((/ o 0)| det[DO]|) vol„ C -I- r]L{Ky/n) n 

interior cubes C 

< Y Mnc)(f)yoln*(C) + V L(KVH) n 

1 cev N { R") 

= (!")}(/) + rjL(Ky/n) n . 


A16.24 
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A similar argument about lower sums leads to 

L N ((/o$)|det[D$]|) > £*(©*(*"))(/) -^) n - 


A16.25 


Proposition A 16.1 explains the 
( K\/n) n in Equation A 16.25. 


Putting these together leads to 

£<*>(!>* (i n ))(/) “ rjL(K\/n) n < Ln((I ° $)|det[D$]|) 

1 + V A16.26 

< Un((/ o $)jdet[D$]|) < n ))(f) + vL(K y/n) n . 

We can choose N 2 larger yet so that the difference between upper and lower 
sums 


A16.27 


^«*>(P N j(R "))(/) “ -^4>(PN 2 (K n ))(/) < 

since / is integrable and $(D(M n )) is a nested paving. 

If a, 6, c are positive numbers such that |a — 6| < 77 , then 

(_£_^ + „ c ) _ , (1 + ^- (Lr _^ +2 ,c < t)(1 + o + 6 + 2c), 

' ' / \ » / i 4/1 Art 


which will be arbitrarily small when t] is arbitrarily small, so 

IW(/ o *)| det|D*l|) - W(/°*)I**P*1I) 


A16.28 


A16.29 


can be made arbitrarily small by choosing rj sufficiently small (and the corre- 
sponding jV 2 sufficiently large). This proves that (/°$)| det[D$]| is integrable, 
and that the integral is equal to the integral of /. □ 


A. 17 A Few Extra Results in Topology 

In this section, we will give two more properties of compact subsets of R n , which 
we will need for proofs in Appendices A. 18 and A. 22. They are not particularly 
harder than the ones in Section 1.6, but it seemed a bad idea to load down that 
section with results which we did not need immediately. 


Theorem A1T.1 (Decreasing Intersection of n es t e d compact sets). If 

Xk C M" is a sequence of non-empty com pact sets , such that X\ D X 2 D . . . , 
then 

H ** M- ayj . 1 


Note that the hypothesis that the Xk are compact is essential. For instance, 
the intervals (0, 1 /n) form a decreasing intersection of non-empty sets, but their 
intersection is empty; similarly, the sequence of unbounded intervals [A:, 00 ) is 
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a decreasing sequence of non-empty closed subsets, but its intersection is also 
empty. 

Proof. For each k, choose x* f Xk (using the hypothesis that Xk Since 
this is in particular a sequence in X \ , choose a convergent subsequence x^ . The 
limit of this sequence is a point of the intersection since the sequence 

beyond x* ,k > m is contained in X m , hence the limit also since each X m is 
closed. □ 

The next proposition constitutes the definition of “compact” in general topol- 
ogy; all other properties of compact sets can be derived from it. It will not play 
such a central role for us, but we will need it in the proof of the general Stokes's 
theorem in Appendix A. 22. 

Theorem A17.2 (Heine-Borel theorem). If X Cl n is compact, and 
Ui c W 1 is & family of open subsets such that X c UU i} then there exist 
finitely many of the open sets, say U\ , . . . , Un, such that 

X C t/i U • • • U Un. A17.2 

Proof. This is very similar to Theorem 1.6.2. We argue by contradiction: 
suppose it requires infinitely many of the Ui to cover X. 

The set X is contained in a box -10* < x t < 10* for some N. Decompose 
this box into finitely many closed boxes of side 1 in the obvious way. If each 
of these boxes is covered by finitely many of the Ui, then all of AT is also, so at 
least one of the boxes B 0 requires infinitely many of the £/* to cover it. 

Now cut up B 0 into 10 n closed boxes of side 1/10 (in the plane, 100 boxes; 
in R , 1,000 boxes). At least one of these smaller boxes must again require 
infinitely many of the Ui to cover it. Call such a box By , and keep going: cut 
up By into 10 n boxes of side 1/10 2 ; again, at least one of these boxes must 
require infinitely many Ui to cover it; call one such box B 2 , etc. 

The boxes Bi form a decreasing sequence of compact sets, so there exists a 
point x € r\Bi. This point is in X, so it is in one of the £/*. That Ui contains 
the ball of radius r around x for some r > 0, and hence around all the boxes 
Bj for j sufficiently large (to be precise, as soon as y/nfl(P < r). 

This is a contradiction. □ 

A. 18 Proof of the Dominated Convergence 
Theorem 

The Italian mathematician Arzela proved the dominated convergence theo- 
rem in 1885. 



644 Appendix A: Some Harder Proofs 


Many famous mathematicians 
(Banach, Riesz, Landau, Haus- 
dorff) have contributed proofs of 
their own. But the main con- 
tribution is certainly Lebesgue’s; 
the result (in fact, a stronger re- 
sult) is quite straightforward when 
Lebesgue integrals are used- The 
usual attitude of mathematicians 
today is that it is perverse to prove 
this result for the Riemann in- 
tegral, as we do here; they feel 
that one should put it off un- 
til the Lebesgue integral is avail- 
able, where it is easy and natu- 
ral. We will follow the proof of a 
closely related result due to Eber- 
lein, Comm Pure App. Math., 10 
(1957), pp. 357-360; the trick of 
using the parallelogram law is due 
to Marcel Riesz. 


Theorem 4.11.12 (The dominated convergence theorem). Let A : 

l n S be a sequence of I-integrable functions , and let f,g : R n —* R be 
two I-integrable functions, such that 

(1) l/fcl <9 for all k; 

(2) the set ofx where lim*_^oo /*(x) ^ /(x) has volume 0. 

Then 

lim / f k \<Tx\ = / /|<fx|. 

fc — OO JR* 

Note that the term I-integrable refers to a form of the Riemann integral; see 
Definition 4.11.2. 


Monotone convergence 

We will first prove an innocent-looking result about interchanging limits and 
integrals. Actually, much of the difficulty is concentrated in this proposition, 
which could be used as the basis of the entire theory. 


Proposition A18.1 (Monotone convergence). Let f k be a sequence of 
integrable functions, all with support in the unit cubeQ C R n , and satisfying 
1 > f\ > /2 > • • • > 0. Let B C Q be a pavable subset with vol n (B) = 0, 
and suppose that 


Then 


lim /fc(x) =0 if x B. 

k — *oo 


lim f f k \cTx\ = 0. 

fc-too Jin 


Remember that /* < 1, and A k 
is a subset of the unit cube, so the 
first term on the right-hand side of 
Equation A 18.1 can be at most K. 


Proof. The sequence J s „ f k IcPxl is non-increasing and non-negative, so it 
has a limit, which we call 2 K. We will suppose that K > 0, and derive a 
contradiction. 

Let A k c Q be the set A k = {x £ Q | f k {x) > K }, so that since the sequence 
fk is non-increasing, the sets A k are nested: A\ D A 2 D . . . The object is to find 
a point x £ that is not in B\ then lim*-^ f k (x) > K , which contradicts 
the hypothesis. 

It is tempting to say that the intersection of the A k 's is non-empty because 
they are nested, and vol n (A*) > K for all A:, since otherwise 


f f k \<Tx | = f f k \cTx\ + f 
Jq JAk Jq- 


Q-Ate 


/*|<Tx| < K + K, 


.418.1 
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Recall (Definition 4.1.8) that 
we denote by L(f) the lower in- 
tegral of /: 

L(f) = Jim L.v(/) 


The last inequality of Equation 
A18.2 isn’t quite obvious. It is 
enough to show that 

Liv(sup(/fc(x), K)) < K +Lv(XA k ) 

for any N. Take any cube C € 
D/v(R n ). Then either mc(fk) < 
K,in which case, 

me ( fk ) vol« C < K voln C , 

or mc(fk) > K. In the latter case, 
since fk < 1, 

mcifk) voln C < voln C. 

The first case contributes at most 
K vol n Q = K to the lower in- 
tegral, and the second case con- 
tributes at most LN(XA k )- 


This is why the possible non- 
pavability of Ak is just an irritant. 
For typical non-pavable sets, like 
the rationals or the irrationals, the 
lower volume is 0. The set Ak 
is not like that: there definitely 
axe whole dyadic cubes completely 
contained in A*. 


which contradicts the assumption that f Q fk\d n x | > 2A . Thus the intersection 
should have volume at least K , and since B has volume 0. there should be 
points in the intersection that are not in B. 

The problem with this argument is that Ak might fail to be pavable (see 
Exercise A18.1), so we cannot blithely speak of its volume. In addition, even if 
the Ak are pavable, their intersection might not be pavable (see Exercise A 18.2). 
In this particular case this is just an irritant, not a fatal flaw; we need to doctor 
the Ak s a bit. We can replace the volume by the lower volume , vol n (4*), which 
can be thought of as the lower integral: vol n Mjb) = or as the sum of the 

volumes of all the disjoint dyadic cubes of all sizes contained in Ak . Even this 
lower volume is larger than K since //t(x) = inf(/*(x). K) + sup(/*(x), K) - K\ 

2 K< f fk\d n x\= f inf(/ fc (x), K)\<Px\ + f sup(/a.(x), K^cTxl - K 

Jq Jq Jq 

< f sup(A(x) f /0|<rx| = L(sup(f k (x),K)) < K + voi n (Ak). 418.2 
JQ 

Now let us adjust our 4/t’s. First, choose a number N such that the union of 
all the dyadic cubes in 2?jv(IR n ) whose closures intersect B have total volume 
< K/ 3. Let B ' be the union of all these cubes, and let A' k = A k - B\ Note 
that the A' k are still nested, and vpl n (4j t ) > 2/^/3. Next choose e so small that 
e/(l -e) < 2K/3, and for each k let A k C A' k be a finite union of closed dyadic 
cubes, such that vol n (A' k - A k ) < e k . Unfortunately, now the A k are no longer 
nested, so define 

4" = A" n 4 n • • • n A!' k . 418.3 

We need to show that the A! k are non-empty; this is true, since 

2K f 

vol A' k > vol4fc — (e e 2 + e k ) > — > 0. 418.4 

O 1 ~ £ 

Now the punchline: The A! k form a decreasing intersection of compact sets, so 
their intersection is non-empty (see Theorem A17.1). Let x € 0*4^, then all 
f k (x) > K , but x ^ B. This is the contradiction we were after. □ 


We use Proposition A18.1 below. 


Lemma A18.2. Let hk be a sequence of integr&ble non-negative functions on 
Q, and h an integrable function on Q, satisfying 0 < h(x) < 1. If B C Q is a 
pavable set of volume 0, and if W x ) ^ M x ) w ^ien x ^ B, then 


S2 ( /t/t(x)|d n x| > f /i(x)|<T: 
k=\ J Q Jo 


418.5 
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Proof. Set g k = which is a non-decreasing sequence of non-negative 

integrable functions, and g k ~ inf (g k ,h), which is still a non-decreasing se- 
quence of non-negative integrable functions. Finally, set /* = h - g k ; these 
functions satisfy the hypotheses of Proposition A18.1. So 


0 = Jim J /*|<f l x| = j h\d n x\ - Jim J g k \cFx\ > J h\<f'x\ - Jim J p/tleTxl 

«y, 

= J /l|cTx| J h k\(Fx\. □ 


A18.6 


Simplifications to the dominated convergence theorem 


Since / is the limit of the /*, 
and we have assumed / = 0 (and 
therefore J f = 0), we need to 
show that L, the limit of the in- 
tegrals, is also 0. 


Let us simplify the statement of Theorem 4.11.12. First, by subtracting / from 
all the /*, and replacing g by g 4- 1/|, we may assume / = 0. 

Second, by writing the f k = f/f - /~, we see that it is enough to prove the 
result when all /* satisfy /* > 0. 

Third, since when fk > 0, 

0 < f fk\cTx\ < f g\dTxl A18.7 

«/R n J in 

by passing to a subsequence we may assume that lim^oo / Rn f k (x)\d*x\ exists. 
Call that limit L. 

If L ^ 0, there exists R such that 




<L/2. 


A18.8 


The point of this argument 
about “if L ^ 0” is to show that 
if there is a counterexample to 
Theorem 4.11.12, there is a coun- 
terexample when the functions are 
bounded by a single constant and 
have support in a single bounded 
set. So it is sufficient to prove the 
statement for such functions. 


It is then also true that 


limit of this is L 
^ 


/ A|d n x| - f [f k } R \crx\ 

JR n Jr* 


<L/2. 


A18.9 


Thus passing to a further subsequence if necessary, we may assume that 


Jim [Aj^lcTxl > 1 / 2 . 

k—»oc 


A 18.10 


Thus if the theorem is false, it will also be false for the functions [f k ] R , so it is 
enough to prove the theorem for f k satisfying 0 < f k < R, with support in the 
ball of radius R. By replacing f k by f k /R, we may assume that our functions 
are bounded by 1, and by covering the ball of radius R by dyadic cubes of side 1 

and making the argument for each separately, we may assume that all functions 
have support in one such cube. 

To lighten notation, let us restate our theorem after all these simplifications. 
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The main simplification is that 
the functions /* all have their sup- 
port in a single bounded set, the 
unit cube. 

When we call Proposition A18.3 
a “simplified'’ version of the dom- 
inated convergence theorem, we 
don’t mean that its proof is sim- 
ple. It is among the harder proofs 
in this book, and certainly it is the 
trickiest. 


Proposition A18.3 (Simplified dominated convergence theorem). 

Suppose fk is a sequence of integrable functions all satisfying 0 < /* < 1, 
and all having their support in the unit cube Q. If there exists a pavable 
subset B c Q with vol n (B) = 0 such that /*(x) —*■ 0 when x £ B, then 

lim [ f k \d n x\= f lim Md"x | = 0. 

k—OO y^n Jjn k~*OC 


Proof of the dominated convergence theorem 


We will prove the dominated convergence theorem by proving Proposition 
A 18.3. By passing to a subsequence, we may assume that limk—oo J Rn fkld"^ = 
C; we will assume that C > 0 and derive a contradiction. Let us consider the 
set K p of linear combinations 

oo 

Y a m / m .418.11 

m=p 

with all a m > 0, all but finitely many zero (so that the sum is actually finite), 
and Ylm=p = L Note that the functions in K p are all integrable (since they 
are finite linear combinations of integrable functions, all bounded by 1, and all 
have support in Q). 

We will need two properties of the functions g 6 K p . First, for any x € Q-B, 
and any sequence g p e K p , we will have limp_oopp(x) = 0. Indeed, for any 
€ > 0 we can find N such that all / m (x) satisfy 0 < / m (x) < c when m > N, 
so that when p > N we have 


oo oo 

9p( x ) ~ Om/m(x) < (flmC) ~ €• 

m=p m—p 


A18.12 


Second, again if g p € K py we have limp-,**, f Q pp|<f*x| = C. Indeed, choose 
c > 0, and N so large that | J Q / m |cTx| - C\ < e when m > N. Then, when 
p > N we have 


1 / Pp(x)|d"x|-C = /* /m(x)|(Tx| ) - C 

' Q \m=p JQ / 

Y 0m (J Q /m(x)|(Tx|j - C 


. 418.13 


oo 


< (°"* c ) = c - 


m—p 


Let dp — inf^gjf p JJjp 2 (x)|(f*x|. Clearly the d p form a non-decreasing se- 
quence bounded by 1, hence convergent. Choose g p € K p so that Lg 2 < 
dp + l/p. QP 



The appearance of integrals of 
squares of functions in this argu- 
ment appears to be quite unnat- 
ural. The reason they are used 
is that it is possible to express 
( g p — g q ) 2 algebraically in terms of 
(Pp + 9q) 2 i Ppi and 9q • We could 
write 

I 9p - 9q\ = 2 sup {g P ,g q ) - g P - g q , 

but we don’t know much about 
sup (g P ,g q ). 


The second inequality follows 
from Schwarz’s lemma for inte- 
grals (Exercise A 18.3). Write 
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Lemma A18.4. For all € > 0, there exists N such that when p,q> N, 



A 18.14 


Proof of Lemma A18.4. Algebra says that 

J |d"x| +J Qtep + Si)) l<*"x| = Spl^'xt+g f 9?l d ” x l- 

Q Q Q Q A18 15 

But \(g p + g q ) is itself in K N , so f Q \{g p + g^ldPx] > so 

i - ^) 2 (* - + 1) + \{ d * + J) " dN • 

Since the d p converge, we see that this can be made arbitrarily small. □ 


Using this lemma, we can choose a further subsequence h q of the g p so that 

(^J (h q - h q+ i) 2 \(Tx\ 

converges. Notice that 

h q (x) = (h q - h q +i)(x) + (h q +i - h q+2 ){x) + . . . when x ^ B, A18.18 



) 




A18.17 


since 


m 

M X ) ~ ^2( h i+l ~ M( x ) = A18.19 

i=q 


which tends to 0 when m — ♦ oo and x$ B by Equation A18.12. 

In particular, h q < l^m+i - ^m|, and we can apply Lemma A18.2 

to get the first inequality below; the second follows from Schwarz’s lemma for 
integrals: 


f ft,|<f*x I < f ' f |fc m -ft m+ ,||erx| <f '( f (h„ - h m+ ,) 2 \<Tx\) 

JQ m-q m =q Q ' 


1/2 


A18.20 


The sum on the right can be made arbitrarily small by taking q sufficiently 
large. This contradicts Equation A18.13, and the assumption C > 0. This 
proves Proposition A 18. 3, hence also Theorem 4.11.12. □ 


A. 19 Justifying the Change of Parametrization 

Before restating and proving Theorem 5.2.8, we will prove the following propo- 
sition, which we w’ill need in our proof. The proposition also explains why 
Definition 5.2.1 of fc-dimensional volume 0 of a subset of IR n is reasonable. 
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We could state Proposition 
19.1 projecting onto any k coor- 
dinates. 



Figure A 19.1. 


Here X\ consists of the dark 
line at the top of the rectangle at 
left, which is mapped by 71 to a 
pole and then by 72 1 to a point in 
the rectangle at right. The dark 
box in the rectangle at left is V 2 , 
which is mapped to a pole of 72 
and then to the dark line at right. 
Excluding Xi from the domain of 
4> ensures that it is injective (one 
to one); excluding Yz ensures that 
it is well defined. Excluding X 2 
and Y\ from the range ensures that 
it is surjective (onto). 


Proposition A19.1. If X C M n is a bounded subset of k-dimensionaJ vol- 
ume 0, then its projection onto the first k coordinates also has k-dimensional 

volume 0. 


Proof. Let n : IR n -* R* denote the projection of R n onto the first k coordi- 
nates. Choose c > 0, and N so large that 


£ (f) 


< €. 


C€Dn(®") 


A19.1 


Then 


c > 


E 

C€X>w(* n ) 

CdX*<b 



E 

C 1 €2?Ar(® fc ) 

Cinir{X)^<t> 



A19.2 


since for every C\ € D#(R fc ) such that C\ fl ir(X) <$, there is at least one 
C € P/v(K n ) with C € ir~ l (Ci) such that Cc\X ±<t>. Thus vol*(7r(X)) < c 
for any e > 0. □ 


Remark. The sum to the far right of Equation A19.2 is precisely our old 
definition of volume, vol* in this case; we are summing over cubes C\ that are 
in R*. In the sum to its left, we have the side length to the A:th power for cubes 
in R fi ; it’s less clear what that is measuring. A 


Justifying the change of parametrization 

Now we will restate and prove Theorem 5.2.8, which explains why we can apply 
the change of variables formula to the function giving change of parametriza- 
tion. 

Let U\ and U 2 be subsets of R fc , and let 71 and 72 be two parametrizations 
of a k-dimensional manifold M ; 

71 : U\ — ► M and 72 : £^2 M. A19.3 

Following the notation of Definition 5.2.2, denote by X\ the negligible “trouble 
spots” of 71, and by X 2 the trouble spots of 72 (illustrated by Figure A 19.1, 
which we already saw in Section 5.2). Call 

Yi = (7 2 ~ 1 °7i)(*i), "><* *2 = (7,"' 072)^2). 


AIM 
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Theorem 5 . 2 . 8 . Both U° k = Ui — (Xi U Y 2 ) and U% k = £/ 2 - ( X2 U yi) are 
open subsets ofR k with boundaries of k-dimensional volume 0, and 

$ : C/f* —> U% k = o 7 i 

is a C l diffeomorphism with locally Lipschitz inverse. 

Proof. The mapping $ is well defined and injective on U ° k . It is well defined 
because its domain excludes y 2 ; it is injective because its domain excludes X2. 

We need to check two different kinds of things: that $ : — » U% k is a 

diffeomorphism with locally Lipschitz derivative, and that the boundaries of 
U° k and £/| fc have volume 0 . 

For the first part, it is enough to show that $ is of class C l with locally 
Lipschitz derivative, since the same proof applied to 

= 7J" 1 o 72 : U% k -> Uf A 19.5 

will show that the inverse is also of class C l with locally Lipschitz derivative. 
Everything about the differentiability stems from the following lemma. 

Lemma A 19 . 2 . Let M C M n be a k-dimensional manifold, U\, U2 C , and 
7i : Ui —* M, 72 : C/ 2 - M be two maps of class C l with Lipschitz derivative, 
with derivatives that are injective. Suppose that 71 (xi) = 72(x 2 ) = x. Then 
there exist neighborhoods Vi of xi and V 2 of x 2 such that 72 1 0 7i is defined 
on Vi and is a diffeomorphism of Vi onto V 2 . 

This looks quite a lot like the chain rule, which asserts that a composition of 
C 1 mappings is C l , and that the derivative of the composition is the composi- 
tion of the derivatives. The difficulty in simply applying the chain rule is that 
we have not defined what it means for 7^ 1 to be differentiable, since it is only 
defined on a subset of M, not on an open subset of R n . It is quite possible (and 
quite important) to define what it means for a function defined on a manifold 
(or on a subset of a manifold) to be differentiable, and to state an appropriate 
chain rule, etc., but we decided not to do it in this book, and here we pay for 
that decision. 

Proof. By our definition of a manifold, there exist subspaces E\, E2 of R n , an 
open subset W C E u and a mapping f : W -> E2 such that near x, M is the 
graph of f . Let i r 1 : R n — ♦ E\ denote the projection of R n onto E\ , and denote 
by F : W — > R n the mapping 

^(y) = y + f(y) A19.6 

so that 7Ti(F(y)) = y. 
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Consider the mapping th 072 defined on some neighborhood of x 2 , and with 
values in some neighborhood of tt 1 (x). Both domain and range are open subsets 
of and tx\ 072 is of class C 1 . Moreover, [H>(7Tj 072)1x2)] is invertible, for 
the following reason. The derivative 

(D72(x 2 )] 

is injective, and its image is contained in (in fact is exactly) the tangent space 
T X M. The mapping 717 has as its kernel F 2 > which intersects T X M only at the 
origin. Thus the kernel of [D(?ri o7 2 )(x 2)] is { 0 }, which means that [D(7 Ti o 
72) (x 2 )] is injective. But the domain and range are of the same dimension k, 
so [D( 7 Ti 0 7 2 )(x 2 )] is invertible. 

We can thus apply the inverse function theorem, to assert that there exists 
a neighborhood W\ of 717 (x) in which 717 o 72 has a C l inverse. In fact, the 
inverse is precisely 7.J 1 o F, which is therefore of class C l on W\. Furthermore, 
on the graph, i.e., on A/, F o tti is the identity. 

Now write 

7J 1 °7i = I2 1 °E°tti °7i- A 19.7 


Why not four? Because 7^ 1 o F 
should be viewed as a single map- 
ping, which we just saw is differen- 
tiable. We don’t have a definition 
of what it would mean for 72 1 by 
itself to be differentiable. 


This represents 7^ 1 o 7! as a composition of three (not four) C 1 mappings, 
defined on the neighborhood 7[~ 1 (F(Wi)) of Xi, so the composition is of class 
C l by the chain rule. We leave it to you to check that the derivative is locally 
Lipschitz. To see that 72 1 ° 7 i is locally invertible, with invertible derivative, 
notice that we could make the argument exchanging 71 and 72, which would 
construct the inverse map. □ Lemma A 19.2 

We now know that $ : U° k — ► U% k is a diffeomorphism. 

The only thing left to prove is that the boundaries of U° k and [/£* have 
volume 0 . It is enough to show it for t/f*. The boundary of U° k is contained 
in the union of 


( 1 ) the boundary of U\, which has volume 0 by hypothesis; 

( 2 ) X\, which has volume 0 by hypothesis; and 

( 3 ) Y 2 , which also has volume 0, although this is not obvious. 

First, it is clearly enough to show that >2 — X\ has volume 0; the part of 
Y2 contained in X\ (if any) is taken care of since X\ has volume 0. Next, it 
is enough to prove that every point y e Y2 - X\ has a neighborhood W\ such 
that Y2 n W\ has volume 0 ; we will choose a neighborhood on which 7J" 1 o F is 
a diffeomorphism. We can write 

Y2 = Tf 1 ( 72 (* 2 )) = 7f 1 oFoTr 1 0 7 2 (X 2 ). A 19.8 

By hypothesis, 72(^2) has ^-dimensional volume 0 , so by Proposition A 19 . 1 , 
tti o 72 (X 2 ) also has volume 0 . Therefore, the result follows from Proposition 
A 16 . 1 , as applied to 7f 1 o F. □ 
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A. 20 Computing the Exterior Derivative 


Theorem 6.7.3 (Computing the exterior derivative of a A>form). 

(a) If the coefficients a of the k-form 

Y? ■= ^ ^ A ' * A dx i k 6.7.4 

l<*i< •<*fc<n 

are C 2 functions on U C K", then the limit in Equation 6.7.3 exists , and 
defines a (k 4- l)-/orm. 

(b) The exterior derivative is linear over JR: if and xp are k~ forms on 
U C IR n , and a and b are numbers (not functions), then 

d(a<p + bxp) — adifi + bdxp. 6.7.5 

(c) The exterior derivative of a constant form is 0. 

(d) The exterior derivative of the 0-form (i.e., function) f is given by the 
formula 

n 

df = [D/J = £(£>,/)<**. 6.7.6 

t=l 

(e) If f is a function, then 

d (/ dxi x A • • • A dx ik ) — df A dxi x A • • • A dx ik . 6.7.7 

Proof. First, let us prove Part (d): the exterior derivative of a 0-form field, 
i.e., of a function, is just its derivative. This is a restatement of Theorem 1.7.12: 

df(Pz(v)) De, J 71 lim i/(x + hv) - /(x) = (D/(x)]v ^ ^ 

= [D/(x)] T • v. 

Now let us prove part (e), that 

d(fdxi x A • • • A dx ik ) — df A dxi x A • • • A dxi k . A20.2 

It is enough to prove the result at the origin; this amounts to translating <p, 
and it simplifies the notation. The idea is to write / = T°(f) + T J (/) + R(f) 
as a Taylor polynomial with remainder at the origin, where 

the constant term is T°(f)(x) = /( 0), 

the linear term is T l (f)(x) = £>i/(0)xi + • • • + D n f(0)x n = [D/( 0)]x, 
the remainder is |R(x)| <• G|x| 2 , for some constant C. 

We will then see that only the linear terms contribute to the limit. 

Since is a A;-form, the exterior derivative d<p is a (A: + l)-form; evaluating 
it oil k + 1 vectors involves integrating over the boundary (i.e., the faces) of 
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Po(hv i, . . . , /iVfc-f 1)- We can parametrize those faces by the 2 (k + 1) mappings 


We need to parametrize the 
faces of 

Po(hvi,.../iv/t+i) 

because we only know how to inte- 
grate over parametrized domains. 

There are k + 1 mappings 71,, (t), 
one for each i from 1 to k + 1: 
for each mapping, a different v t 
is omitted. The same is true for 

7o.»(t). 



= 7 u(t) =h\i + t jVi -I- 1- ti— 1 v,_ 1 -I- tiVi + 1 + 1- tkVic+i, 


'Yo ? * I : 1 = 7o,t(t) = ^ivi + Mi-iVj_i + t<Vj + i + • • • + $/fcV* + i, 

for i from 1 to fc + 1, and where 0 < tj < h for each j = 1, . . . , k. We will denote 
by Qh the domain of this parametrization. 

Notice that 71., and 70,* have the same partial derivatives, the k vectors 
Vi, . . . , Vfc+i, excluding the vector v,; we will write the integrals over these 
faces under the same integral sign. So we can write the exterior derivative as 
the limit as h — ♦ 0 of the sum 


*+ 1 l f 

J Q (Zb »•<(*)) - /(TQ.i(t))) dx„ 


partial derivatives 

of 71 . < and 70 . i 
-*■ 


✓ \ 

A • • • A dx tk (vi, . . . , v t , . . . , v*+i) |d*t|, 


coefficient (function of t) 

where each term 


fc-form 




k vectors 


A20.3 


J (/(7i,«(t)) -/(7o,i(t))) dx tl A • • • A dxi k (vi, . . . , v x , . . . , v* +1 )|d fc t| >120.4 


constant term: 
linear term: 
remainder: 

In Equation A20.6 the deriva- 
tives are evaluated at 0 because 
that is where the Taylor polyno- 
mial is being computed. 

The second equality in Equa- 
tion A20.6 comes from linearity. 


is the sum of three terms, of which the second is the only one that counts (most 
of the work is in proving that the third one doesn’t count): 

coefficient of k - form fc-form 

^(■ r °^K'>i.*( t )) _:r0 (/)(7o,i(t))) dx Xl A. . .A dx ik (v 1,. . . ,v x , . . . ,v*+i)|d*t|+ 

-^ 1 (/)( 7 o,*(t)))dx il A. . .A dx ik (vi ,. . . ,v x ,. . . ,v fc+1 )|d*t|+ 

W) ~ -R(/)( 7 o,t(t)))dx Xl A ... A dx ik (vu . . . ,v x , . . . ,v* +1 )|d*t|. 
The constant term cancels, since 

^°(/) (anything) - T°(f) (anything) = 0. A20.5 

V I V — - ✓ V V ' 

constant same constant 

For the second term, note that 

7l,«(t) 

7’ 1 (/)(7i,i(t))-r 1 (/)(70,i(t)) = [D/(0)J (ftvi + 70,.(t)) - (D/(0)1 ( 7 o,j(t)) 

= h[D/(0)]vj, ,420.6 

which is a constant with respect to t, so the entire sum for the linear terms 
becomes 
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In the third line of Equation 
A20.7, where does the h k+1 in 
the numerator come from? One 
h comes from the h(D/(0)]v, in 
Equation A20.6. The other h k 
come from the fact that we are in- 
tegrating over Qh , a cube of side 
length h in IR*\ 


The last equality in Equation 
A20.7 explains why we defined the 
oriented boundary as we did, each 
part of the boundary being given 
the sign (— l)* -1 : it was to make it 
compatible with the wedge prod- 
uct. Exercise A20.1 asks you to 
elaborate on our statement that 
this equality is “by the definition 
of the wedge product.” 


X -^‘(/)U.(t)))<fa., A-.-AdZi, 

(vi, . . . , Vj, . . . , Vfc+i) |d fc t| 

h k + l 

= ^(-l )* -1 ([D/( 0 ))v t )<&„ A • • • A dx ik (vu • • • , v», . . • , Vfc+i) 

1=1 

= (df A dxi l A ■ ■ ■ A dx ik )(Po{\i, . . . ,Vk+i)) A 20.7 


by the definition of the wedge product. 


Now for the remainder. Since we have taken into account the constant and 
linear terms, we can expect that the remainder will be at most of order h 2 , and 
Theorem A9.7, the version of Taylor’s theorem that gives an explicit bound for 
the remainder, is the tool we need. We will use the following version of that 
theorem, for Taylor polynomials of degree 1: 


where 


/(a+h) - P / 1 , a (a+h) 


<C 


sup sup | Dj /(c) | — C. 

/€^n +2 c6(a,a+ti] 



2 


1 


420.8 

420.9 


This more or less obviously gives 


l*(/)(70,(t))| < Kh 3 and |fl(/)(T M (t))| < Kh 3 , 420.10 


where K is some number concocted out of the second derivatives of / and the 
lengths of the \j. The following lemma gives a proof and a formula for K. 


The l -norm ||v(|i is not to be 
confused with the norm ||A|| of a 
matrix A } discussed in Section 2.8 
(Definition 2.8.5). 

We see that the 53”=! |h*| in 
Equation A20.8 can be written 

IN- 


Lemma A20.1. Suppose that all second partials of f are bounded by C at 
all points 7o,*(t) and 7 i,j(t) when t e Qh- Then 

\R(f)(lo.i(t))\<Kh 2 and |fl(/)(7.,i(t))| < Kh 3 , 420.11 

where K = Cn(k -I- l) 2 (supj |vj|) 2 . 

Proof. Let us denote ||v||i = |vi| + h |v„| (this actually is the correct 

mathematical name). An easy computation shows that ||v[|i < y/n\$\ for any 
vector v. 

A bit of fiddling should convince you that 


||7o,*(t)||i < |h|(||vi||i H h Uvfc+jH,) 

< \h\(k+ 1) sup || || i < |h|(fc+ ljv/nsuplvjl. 
j j 

Now Taylor’s theorem, Theorem A9.7, says that 

|P(/)(7o,i(t)| < C||7o.i(t)||f < h 2 Cn(fc + l) 2 (sup|vj|) 2 = h 2 K. 


A20.12 


420.13 
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Exercise 20.2 asks you to prove 
this. It is an application of Theo- 
rem 6.7.3 


The same calculation applies to 7i,i(t)- □ 

Using Lemma A20.1, we can see that the remainder disappears in the limit, 
using 

|fl(/)(7l..(t)) - «(/)(70.i(t))| < |fi(/)(7l.i(t)| + |*(/)(70,i(t))| < 2 ^ i4 
Inserting this into the integral leads to 

f P(/)(7i,»(f)) — P(/) (7o,»(t)) dXi 1 A . . .A dxi k (v i, . . . ,v t , . ■ • , v fc+i) \d t| 
JQh' * — — ' 


<2Kh 2 


< [ (2h 2 K\dx il A-*- Adx ik (vi,. . . , Vj,. . v*+i)| |d fc t|) 
JQh V 

< h k+2 K (sup |vj|) fc , 

3 

which still disappears in the limit after dividing by h k+l . 

This proves part (e). Now let us prove part (a): 

...ikdxii A • * A dx i k ^ P x (Vi, . . $k+ 1) 

“ ii™ Lfc+l f • -ikdXii A"‘A dXjJ) 


A20.15 


A20.16 


»•••»*«+! / 

lim 1 IJ+i I a ‘‘- itfci, A • • • A dii k 1 

d-o J 9 p S (i, ? W1 ) / 


= £ 

1 < 1 1 < -<*fc<n 

part e 

= ^ (dftjj ,, ,j fc A dxi 1 A • • A dk % k ) (p x (vj , . . . , v^4.i . 

l<*x <•••<**<»» 

This proves part (a); in particular, the limit in the second line exists because 
the limit in the third line exists, by part (e). Part (b) is now clear, and (c) 
follows immediately from (e) and (a). □ 


The following result is one more basic building stone in the theory of the exterior 
derivative. Saying that the exterior derivative with respect to wedge products 
satisfies an analog of Leibnitz’s rule for differentiating products. There is a sign 
that comes in to complicate matters. 

Theorem A20.2 (Derivative of wedge product). If (fi is a k-form and 
1/3 is an l -form, then 

d(ip A iff) * dip a %l> -f (-l)V A *l>. 
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A. 21 The Pullback 

To prove Stokes’s theorem, we will to need a new notion: the pullback of form 
fields. 

Pullbacks and the Exterior Derivative 

The pullback describes how integrands transform under changes of variables. It 
has been used implicitly throughout Chapter 6, and indeed underlies the change 
of variables formula for integrals both in elementary calculus, and as developed 
in Section 4.10. When you write: “let x = f(u), so that dx = f'(u)du ,” you 
are computing a pullback, f*dx~ f(u) du. Forms were largely invented to 
keep track of such changes of variables in multiple integrals, so the pullback 
plays a central role in the subject. In this appendix we will give a bare bones 
treatment of the pullback; the central result is Theorem A21.8. 

The pullback by a linear transformation 

We will begin by the simplest case, pullbacks of forms by linear transformations. 

TV is pronounced “T upper 

J Definition A21.1 (Pullback by a linear transformation). Let V, W 

be vector spaces, and T :V -* W be a linear transformation. Then T* is a 
linear transformation A k (W) — > A k (V), defined as follows: if (p is a jfc-form 
on M m , then 

TV(*i, • ..,?*) = ip(T(?i), . . . ,T(?*)). >121.1 

The pullback of T*<p, acting on k vectors Vi , . . . , v* in the domain of T, 
gives the same result as <p, acting on the vectors T(?i), . . . , T(v*) in the range. 
Note that the domain and range can be of different dimensions: T*ip is a A:- form 
on V , while <p is on W. But both forms must have the same degree: they both 
act on the same number of vectors. 

It is an immediate consequence of Definition A21.1 that T* : A k (W) — > 
A k (V) is linear: 

T* Vi + </? 2 ) — T*<p j + TV 2 T*(a<p) = aT*y?, >121. 2 

as you are asked to show in Exercise A21.3. 

The following proposition and the linearity of T* give a cumbersome but 
straightforward way of computing the pullback of any form by a linear trans- 
formation T : R n — ► R m . 
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Determinants of minors , i.e., 
of square submatrices of matri- 
ces, occur in many settings. The 
real meaning of this construction 
is given by Proposition A21.2. 


Proposition A21.2 (Computing the pullback by a linear transfor- 
mation). Let T : R n -* be a linear transformation, and denote by 
xi, . . • ,x n the coordinates in and by yi, . . . >ym the coordinates in IR m . 
Then 

T*dyiy A • • • A dyi h = £ A •••Ad**, A21.3 

where bj lt ,.j k is the number obtained by taking the matrix of T, selecting 
its rows < 1 , . . . , t* in that order, and its columns &nd taking the 

determinant of the resulting matrix. 


Example A21.3 (Computing the pullback). Let T : R 4 — M 3 be the 
linear transformation given by the matrix 


[T] = 


10 0 1 
0 10 1 
0 0 11 


A21.4 


where 


then 

T*dy 2 A dy 3 = b\ >2 dx\ A dx 2 + b\ y3 dx\ A dx 3 -f b\^dx\ A dx 4 4- 62 , 3 * 1 x 2 A dx 3 

+ b 2 t 4 dx 2 A dx 4 -f b 3 , 4 dx 3 A dx 4 , 

A21.5 

= 0, 

= - 1 . 

A21.6 

A21.7 


bi (2 = det 

0 

0 

r 

0 

= 0, 

61,3 = det 

0 O' 

1 0 

= 0, 

6 i <4 = det 

0 

0 

1 

1 

62,3 — det 

'l 

0 

0 

1 

= 1 , 

62 i4 = det 

1 f 
0 1 

= 1 , 

^2.4 = det 

0 

1 

• 

l : 

1 


So 


T*dy 2 A dy 3 = dx 2 A dx 3 4 - dx 2 A dx 4 — dx 3 A dx 4 . 


Proof. Since any A: form on R n is of the form 

53 A ■ ■ • A dxj k , 421.8 

I<7i < — <jk<n 

the only problem is to compute the coefficients. This is very analogous to 
Equation 6.2.20 in the proof of Theorem 6.2.7: 


(T dyi l A • ■ A dyi k )(ej ll . . . ,Oj k ) 

= (dy h A • • • A dyi k ) {T(Oj l ), . . . , T(ej k )) . 


A21.9 


This is what we needed: dy*, A • • • A dyi k selects the corresponding lines from 
the matrix [(T(e 7l ),..., T(e Jfc )j, but this is precisely the matrix made up of the 
columns ji , . . . , jk of (T] . □ 
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Pullback of a fc-form field by a C 1 mapping 

If U C R n , V C lR m are open subsets, and f : U — ► V is a C l mapping, then we 
can use f to pull back A:- form fields on V to Aj-form fields on U. The definition 
is similar to Definition A21.1, except that we must replace f by its derivative. 

Definition A21.4 (Pullback by a C 1 mapping)* If <p is a fc-form field 
on V, and f : U — * V is a C l mapping, then t*(p is the k-form field on U 
defined by 

(f» (/>*(*!,... ,**)) = v>(/> f(x) ((Df(x)]?,, . . . , [Df(x)]?*)). A21.10 


Note that if U were a bounded 
subset of M 2 , then Equation 6.3.7 
says exactly, and by the same com- 
putation, that 

/ J/2 dy\ A d|/3 

Jt(U) 

= / 4x^2 \dxi dx 2 |. 

Ju 


If A; = n, so that f (U) can be viewed as a parametrized domain, then our 
definition of the integral over a parametrized domain, Equation 6.3.7, is 


[ <p= f f V- 

Jt(U) Jv 


A2l.ll 


Thus we have been using pullbacks throughout Chapter 6. 


( 2 )- 


/ 

= (V 2 dyi A dy 3 ) 


( 

*1*2 

\ A 


2x\ 

x 2 

0 


0 

Xi 

2x 2 


\ 


— xix 2 det 


2x\ 0 _ 

0 2x2 ~ 


) 


2 I - 4*?4 


A21.12 


Example A21.5 (Pullback by a C 1 mapping). Let f : M 2 — ► HR 3 be given 
by 

f ^ = (*;?)' 

We will compute V{y 2 dy x A dy 3 ). Certainly 

f*(i /2 dyi A dy 3 ) = 6 dx x A dx 2 

for some function 6, and the object is to compute that function: 


A21.13 


A21.14 


So 


{ *(V 2 dy x A dy 3 ) = 4xjx2 dx x A dx 2 . A A21.15 
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Pullbacks and compositions 


The pullback behaves nicely 
under composition: (S o T)‘ = 
T*S\ 


To prove Stokes’s theorem, we will need to compute with pullbacks. One thing 
we will need to know is how pullbacks behave under composition. First let us 
see what we find for pullbacks by compositions of linear transformations: 

(S O T) V(v . ..,**)« *((S o r)(v, (5 O D(v*)) 

= SV(r(v ,) T(v k )) - 42116 

= T*S*<p(v i,...,v*). 


Thus (5 o T)* =T*S * . The same formula holds for pullbacks of form fields by 
C 1 mappings, which should not be surprising in view of the chain rule. 


Proposition A21.6 (Compositions and pullbacks by nonlinear 
maps). IfUcR n ,V C R m , and l^C^are open, f : U -> V, g : V -+ W 
are C 1 mappings , and ip is a k-form on W, then 

(gof)V = f*gV- -421.17 


The first, third, and fourth 
equalities in Equation A21.18 are 
the definition of the pullback for 
gof, g and f respectively ; the sec- 
ond equality is the chain rule. 


Proof. This follows from the chain rule: 

(g o f y V(P°(V, , • • . , v*)) = V (p ( » g(f(x)) ([D(g o f)(x)]v, [D(g o f )(x)]v*)) 

= V’( P (g(r( X )i(l D 8(f( x ))IIDf(x)]vi,. ■ . , [Dg(f(x))][Df(x)]v t )j 
= gV^xjODftxJJvj,. . . , [Df(x)|v*)) 

= g-fV(^(v„...,v*)). □ ,421.18 


The puUback and wedge products 

We will need to know how pullbacks are related to wedge products, and the 
formula one might hope for is true. 

Proposition A21.T (PuUback and wedge products). If U C K" and 
V c R m are open subsets, f : U -+ V is a C 1 mapping, and <p and $ axe a 
k-form and an l- form on V respectively, then 

fV A Vii) = A ^). A21.19 


Proof. This is one of those proofs where you write down the definitions and 
follow your nose. Let us spell it out when f = T is linear; we will leave the 
general case as Exercise 21.4. Recall that the wedge product is a certain sum 
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over all permutations a of {1, . . . , k + /} such that the (7(1) < < a(k) and 

1) < < cr(k+l)\ as in Definition 6.2.13, these permutations are denoted 

Perm (k,l). We find 

T*(<p A ^)(vi , . . • , v k +i) = (V A if) {T(? i ), • • • , T(v k +0) 

= ^ sgn(cr)<^^T(v a (i)) t . . ^T(v a (ji.))^^T(v a (k+i))r- - ■ 

(?€Perm(k,l) 

~ 22 Sgn(<7)TV(^<r(l)»- >V a ( k )) T* 1 p(v a (k+ !)»••• ^<r(k+l)) 

<r€.Pcrm(k,l) 

= (rVAT»(v 1( ... f v H i). □ >421.20 


The exterior derivative is intrinsic. 

The next theorem has the innocent appearance df * = f *d. But this formula 
says something quite deep, and although we could have written the proof of 
Stokes’s theorem without mentioning the pullback, the step which uses this 
result was very awkward. 

Let us try say why this result matters. To define the exterior derivative, 
we used the parallelograms /*(vi, . . . , v*). For these parallelograms to exist 
requires the linear structure of R n : we have to know how to draw straight lines 
from one point to another. 

It turns out that this isn’t necessary, and if we had used “curved parallel- 
ograms” it would have worked as well. This is the real content of Theorem 
A21.8. 


This proof is thoroughly un- 
satisfactory: it doesn’t explain at 
all why the result is true. It is 
quite possible to give a concep- 
tual proof, but this proof is as 
hard as (and largely a repetition 
of) the proof of Theorem 6.7.3. 
That proof is quite difficult, and 
the present proof really builds on 
the work we did there. 

In Equation A21.22 we are us- 
ing Theorem A20.2. 


Theorem A21.8 (Exterior derivative is intrinsic). Let U C K n , V c 
R m be open sets , and f : U — » V be a C l mapping. If (p is a k-form Geld 
on V, then the exterior derivative of <p pulled back by f is the same as the 
pullback by f of the exterior derivative of p: 

df m ip = V(kp. 


Proof. We will prove this theorem by induction on k. The case k — 0, where 
<p = g is a function, is an application of the chain rule: 

f 'dg{PZ(v)) = dg(Pf( X ) [Df (x)] v) = [D^(f(x))](D^(x)]v 

= [Dy o f(x)]v = d(g o f )(PJ(v)) A21.21 

= d(rg)(P°(v)). 


If k > 0, it is enough to prove the result when we can write <p = ip A dxi , 
where ip is a (k - l)-form. Then 
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The d{Vdx t ) in the first line 
of Equation A21.23 becomes the 
ddf*Xi in the second line. This 
substitution is allowed by induc- 
tion (it is the case k — 0) because 
Xi is a function. In fact f*x, = /*, 
the ith component of f . Of course 
ddf m Xi = 0 since it is the exterior 
derivative taken twice. 


f'd(ipAdXi) = f*(# A <fc: ( + (-!)* VAddXi) 42122 

= {'{dip) A t'dx i = d(f'ip) A t'dXi, 

whereas 

d('(iP Adi,) = d(f'4> A t'dx,) = (d(f») A f’dx, + f > A d(fdx.) >121.23 
= (d(f») A t'dXi + f*tf A ddi’ii = (d(f») A f'dx,. □ 


A. 22 Proof of Stokes’s Theorem 


The proof of this theorem uses 
virtually every major theorem 
contained in this book. Exercise 
22.1 asks you to find as many as 
you can, and explain where they 
are used. 


Theorem 6.9.2 (Generalized Stokes’s theorem). Let X be a compact 
piece-with-bound&ry of a (k 4- l)-dimensional oriented manifold M C R n . 
Give the boundary dX of X the boundary orientation, and let y? be a k-form 
defined on a neighborhood of X. Then 



6.9.3 


Proposition A22. 1 is a general- 
ization of Proposition 6.9.7; here 
we allow for corners. 


A situation where the easy proof works 


We repeat some of the discus- We will now describe a situation where the “proof’ in Section 6.9 really does 
sion from Section 6.9, to make this work. In this simple case, we have a (k - l)-form in R fc , and the piece we will 
proof self-contained. integrate over is the first “quadrant.” There are no manifolds; nothing curvy. 


It was in order to get Equation 
A22.2 that we required <p to be 
of class C 2 , so that the second 
derivatives of the coefficients of 
have finite maxima. 

The constant in Equation 
A20.15 (there called C, not K ), 
comes from Theorem A9.7 (Tay- 
lor’s theorem with remainder with 
explicit bound), and involves the 
supreme of the second derivatives. 

In Equation A20.15 we have < 
h k+2 K because there we are com- 
puting the exterior derivative of 
a fc-form; here we are computing 
the exterior derivative of a (k - 1)- 
form. 


Proposition A22.1. Let U be a bounded open subset of R*, and let U+ 
be the part of U in the first quadrant, where xi > 0, . . . , Xk > 0. Orient U 
by det on dU+ carries the boundary orisotarion. Let <p bea(k — l)-form 
on R k of class C 2 , which vanishes identically outside U. Then 



422.1 


Proof. Choose c > 0. Recall from Equation A20.15 (in the proof of Theorem 
6.7.3 on computing the exterior derivative of a fc-form (Equation A20.15) that 
there exists a constant K and 6 > 0 such that when |h| < 6, 


d<p(P x (hei,...,he k )) - f 

Jd 


dP x (h9 lt ...,h9k) 




< Kh k +\ 


422.2 


Denote by the “first quadrant,” i.e., the subset where all Xi > 0. Take 
the dyadic decomposition D/y(IR fe ), where h = 2~ N . By taking N sufficiently 
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large, we can guarantee that the difference between the integral of dp over U\ 
and the Riemann sum is less than c/2: 



A22.3 


The constant L depends on the 
size of the support of p. More 
precisely, it is the side-length of 
the support, to the kth power. 


Now we replace the A:- parallelograms of Equation A22.2 by dyadic cubes, 
and evaluate the total difference between the exterior derivative of p over the 
cubes C , and p over the boundaries of the C. The number of cubes of 2V(K*) 
that intersect the support of p is at most L2 kN for some constant L, and since 
h = 2~ n , the bound for each error is now K2~ N( - k+l \ so 

| Y MQ - Y f V I - .K2~ N(k+l \ = LK2~ n . 

C€“D\(E k ) C€'Dfl/(E M ) No. 0 f cubes bound for 

each error 

.422.4 

This can also be made < c/2 by taking N sufficiently large — to be precise, 
by taking 


N > log 2LK - log c 
log 2 

Finally, all the internal boundaries in the sum 

E / 

C(ET> n ( R^) JdC 


A22.5 


A22.6 


cancel, since each appears twice with opposite orientations. So (using C' to 
denote cubes of the dyadic composition of 3R* ) we have 


E Jv- E [*>-[ v. 

C€T> N (R*) JdC C'€V N (d R*r C ' '' 9 *+ 

Putting these inequalities together, we get 


A22.7 


<«/2 <«/2 

\L E mc)\ + \ y mc)- y f A <«. 

C€Pn(»*) C€P w (»*) C€T> n (R*_) ^ 9C 


A22.8 


i.e., 


/ dp- f p 
Ju+ Jqv+ 


< c. 


A22.9 


Since c is arbitrary, the proposition follows. □ 



A. 22 Proof of Stokes’s Theorem 663 


The sum YliLi <**( x ) ~ 1 of 
Equation A22. 10 is called a parti- 
tion of unity , because it breaks up 
1 into a sum of functions. These 
functions have the interesting 
property that they have small sup- 
port, which makes it possible to 
piece together global functions, 
forms, etc., from local ones. As far 
as we know, they are exclusively of 
theoretical use, never used in prac- 
tice. 


The power 4 is used in Equa- 
tion A22.14 to make sure that 0r 
is of class C 2 ; in Exercise A22.2 
you are asked to show that it is 
of class C7 3 on all of M*. It evi- 
dently vanishes off the ball of ra- 
dius R and, since 4((l/2) 3 — l) 4 = 
324/256 > 1, we have 0r(x) > 1 
when |x| < R/2. It is not hard to 
manufacture something analogous 
of class C m for any m, and rather 
harder but still possible to man- 
ufacture something analogous of 
class C°°. But it is absolutely im- 
possible to make anything of the 
sort with functions that are sums 
of their Taylor series. 


Partitions of unity 

To prove Stokes’s theorem, our tactic will be to reduce it to Proposition A22.1, 
by covering X with parametrizations that satisfy the requirements. Of course, 
this will mean cutting up X into pieces that are separately parametrized. This 
can be done as suggested above, but it is difficult. Rather than hacking X 
apart, we will use a softer technique: fading one parametrization out as we 
bring another in. The following lemma allows us to do this. 


Lemma A22.2 (Partitions of unity). If a if for i - 1, . . . , N, are smooth 
functions on X such that 

N N 

^2 a *(x) = 1 then dip = £ d(anp). A22.10 

*=1 i = 1 

Proof. This is an easy but non-obvious computation. The thing no* to do is 
to write d(otiip) = d&i A <p + a i dip; this leads to an awful mess. Instead 
take advantage of Equation A22.10 to write 

53 d(Qiip) = d^ (J3 a,) <p \ = dip. a .422.11 


This means that if we can prove Stokes’s theorem for the forms a,p, then it 
will be proved, since if 


/ d(a { ip) = / anp 

Jx Jax 


A22.12 


for each i = 1, . . . , JV, then 


42213 


We will choose our a,- so that in addition to the conditions of Equation A22.10, 
they have their supports in subsets U t in which Af has the standard form of 
Definition 6.6.1. It will be fairly easy to put these individual pieces into a form 
where Proposition A22.1 can be applied. 


Choosing good parametrizations 

Below we will need the “bump” function p R : R* -* R given by 


^ (x)= /40£-i) 4 if M 2 < R 2 

1 0 if |x| 2 > R 2 , 


422.14 
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and shown in Figure A22.1. 


Go back to the Definition 6.6.1 of a piece- with-boundary X of a manifold 


A/cr 



For every x 6 X, there exists a ball U x around x in W 1 such that: 

• U x H M is the graph of a mapping f : U\ — > £ 2 , where U\ C E\ is an open 
subset of the subspace spanned by k of the standard basis vectors, and £2 is 
the subspace spanned by the other n — k. 

• There is a diffeomorphism G : U\ — * V c R k such that 


Figure A22.1. 

Graph of the bump function Pr 
of Equation A22.14: 

Pr(x) = 

/ 4 (*# - >) 4 if |x|2 5 Ri 
l 0 if |x| 2 > ft 2 , 


^rit/= {[ f( “)] | Gi(u)>°, ,422.15 

where u denotes a point of U\. 

Since X is compact, we can cover it by finitely many U Xl , . . . , 6 Xjv satisfying 
the properties above, by Theorem A 17.2. (This is where the assumption that 
X is compact is used, and it is absolutely essential.) In fact, we can require 
that the balls with half the radius of the t/ Xfn cover X. We will label U m = 
G Xm , £/{", f m , G m the corresponding sets and functions 

Call Ryn that half-radius, and let P m : M n -> M be the function 


0 m (x) = @R m (x - x m ), A 22. 16 

so that P m is a C 2 function on R n with support in the ball of radius Rm around 
Xm* 

Set P(x) = 5Zm=i An» this corresponds to a finite set of overlapping bump 
functions, so that we have P(x) > 0 on a neighborhood of X. Then the functions 

Q ~ (x) = w A2217 

are C 2 on some neighborhood of X; clearly o m (x) = 1 for all x 6 X, so 
that if we set ip m — a m y?, we can write 

N N 

a "'V = V™- >122.18 

m=l m=l 

Let us define 


b m = f m o(G m )- 1 : V m - M. 

We have now cut up our manifold into adequate pieces: the forms hj^(a m y>) 
satisfy the conditions of Proposition A22.1. 
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Completing the proof of Stokes’s theorem 


The first equality = 1. 

The second is Equation A22.10. 

The third says that OmV 3 has its 
support in C/ m , 

the fourth that h m parametrizes 
U m . 

The fifth is the first crucial step, 
using dh* = h*d, i.c., Theorem 
A21.8. 

The sixth, which is also a crucial 
step, is Proposition A22.1. 

Like the fourth, the seventh is that 
hm parametrizes U m , and for the 
eight we once more use a m = 1. 


The proof of Stokes’s theorem now consists of the following sequence of equali- 
ties: 



£ / <i(AVm) 
^1 Jv? 



h'Wm) 


£ f 'Pm = / ¥>• -422.19 

^1 J»x Jf >x 
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A5.1 Using the notation of Theorem 2.9.10, show that the implicit function 
found by setting g(y) = ^ (y ) 1S t ^ le unique continuous function defined on 
Br{ b) satisfying 

F («(/)) =0 and g(b) = a. 


A7.1 In the proof of Proposition 3.3.19, we start the induction at k ■=■ 1. 
Show that you could start the induction at k = 0 and that, in that case, 
Proposition 3.3.19 contains Theorem 1.9.5 as a special case. 

A8.1 (a) Show that Proposition 3.4.4 (chain rule for Taylor polynomials) 

contains the chain rule as a special case. 

(b) Go back to Appendix Al (proof of the chain rule) and show how o and 
O notation can be used to shorten the proof. 

A9.1 Let / — gsmfar+y 2 ) Use Maple , Mathematical or similar software. 

(a) Calculate the Taylor polynomial P* s of degree k = 1, 2 and 4 at a = ^ ^ . 

(b) Estimate the maximum error | Pf g - f | on the region \x - 1] < .5 and 
|y — 1| < .5, for k = 1,2. 

(c) Similarly estimate the maximum error in the region |a; - 1| < .25 and 
\y - 1| < .25, for k = 1, 2. 
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*A9.2 (a) Write the integral form of the remainder when sin(xj/) is approxi- 

mated by its Taylor polynomial of degree 2 at the origin. 

(b) Give an upper bound for the remainder when x 2 + y 2 < 1/4. 


A9.3 Prove Equation 9.2 by induction, by first checking that when k = 0, it 
is the fundamental theorem of calculus, and using integration by parts to prove 

i j\h - t)V* +,) W<« = ±9 lk+1 Ha)h k+l + 

A12.1 This exercise sketches another way to find the constant in Stirling’s 
formula. We will show that if there is a constant C such that 

n! = C>/n (■■)" (1 + o(l)). 


as is proved in Theorem A 12.1, then C = \/2tt. The argument is fairly elemen- 
tary, but not at all obvious. Let c* = f* sin” xdx. 

(a) Show that c n > c„_i for all n = 1,2, 

(b) Show that Cn = r ^c n _ 2 - Hint: write sin”x = sin x sin” -1 x and inte- 
grate by parts. 

(c) Show that cq = n and ci = 2, and use this and part b) to show that 


^2n — 


C2n+1 = 


2n - 1 2n - 3 1 


(2n)!7r 


"2" 2 2 "(n!) 2 


2n 2n - 2 

2 n 2n - 2 2 0 2 2n (n!) 2 2 

2n+l ' 2n — 1 * 3 2 ” (2n+l)!* 


(d) Use Stirling’s formula with constant C to show that 


1 2 

C2n = C\n ,r ( 1 + 0 ( 1 )) 

C2n+1 = ^&T (1+0(1)) - 

Now use part a) to show that C 2 <2n + o(l) and C 2 > 2n + o(l). 

A18.1 Show that there exists a continuous function / : tit R, bounded 
with bounded support (and in particular integrable), such that the set 

{x € i*|/(x) > 0} 

is not pavable. For instance, follow the following steps. 

(a) Show that if X C K” is any non-empty subset, then the function 

/x(x) = inf |x-y| 
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is continuous. Show that fx (x) = 0 if and only if x € X. 

(b) Take any non-pavable closed subset X C [0, 1), such as the complement 
of the set U - e that is constructed in Example 4.4.2, and let r = XU{0.1}. 
Set 

f(x) = -X[o.i]{x)fx>(x)- 
Show that this function / satisfies our requirements. 

A18.2 Make a list oi, a 2 , . . . of the rationals in jO, 1]. Consider the function 
fk such that 

f k (x) = 0 if x £ [0, 1], or if x € {ai, . . . ,a*}; 
fk(x) = 1 if x € [0, 1] and x £ {ai, . . . ,a*}. 

Show that all the fk are integrable, and that f(x) = lim^^oo /*(x) exists for 
every x, but that / is not integrable. 

A18.3 Show that if / and g are any integrable functions on IR n , the 
(f /(x) 9 (x)|<rx|) < ( / (/(x))Vx|J (^(9(x)) 2 |<rx|J . 

Hint: follow the proof of Schwarz’s inequality (Theorem 1.4.6). Consider the 
quadratic polynomial 

((/ + «ff)(x)) 2 |cTx|= / (/(x))*|«Tx| + 1 f /(x) 5 (x)|d"x| + ( 2 f ( 9 (x))Vx|. 

JR n J R n J R n 

Since the polynomial is > 0, its discriminant is non-positive. 

A20.1 Show that the last equality of Equation A20.7 is “by the definition 
of the wedge product.” 

A20.2 Prove Theorem A20.2 concerning the derivative of wedge product: 

(a) Show it for 0- forms, i.e., 

d(fg) = fdg + gdf. 

(b) Show that it is enough to prove the theorem when 

( p = o(x) dx il A • • • A dXi k ; 

'll) — 6(x) dxj l A • • * A dxj t . 

(c) Prove the case in (b), using that 

<p Atp = a(x)b(x) dx it A • • • A dx ik A dxj x A • • • A dx jt . 

A21.3 (a) Show that the pullback T* : A k (W) — A k (V) is linear. 

(b) Now show that the pullback by a C 1 mapping is linear. 

A21.4 Prove Proposition A21.7 when the mapping f is only assumed to be 
of class C 1 . 
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A22.1 Identify the theorems used to prove Theorem 6.9.2, and show how 
they are used. 

A22.2 Show (proof of Lemma A22.2) that is of class C 3 on all of R*. 



Appendix B: Programs 


The programs given in this appendix can also be found at the web page 
http: / /math.cornell.edu/” hubbard/ vectorcalculus. 

B.l Matlab Newton Program 


This program can be typed into 
the Matlab window, and saved as 
an m-file called “Newton.m”. It 
was created by a Cornell under- 
graduate, Jon Rosenberger. For 
explanations as to how to use it, 
see below. 

The program evaluates the Ja- 
cobian matrix (derivative of the 
function) symbolically, using the 
link of Matlab to Maple. 


function [x] ■ nevton(F, xO, iterations) 
vars = ’ [ * ; 

for i * l:length(F) 
iS = num2str(i) ; 
vars = [vars ’x’ iS * ']; 

eval([’x’ iS 1 * ByrnC^x* iS */. declare xn to be symbolic 

end 

vars * [vars 1 ] 1 ] ; 

eval ( [ * vars * * vars *;’]); 

J = jacobianCF, vars); 

x e xO; 

for i * 1: iterations 

JJ * double(subs(J, vars, x.’)); 

FF ■ double (subs (F, vars, x.’)); 
x - x - inv(JJ) * FF 
end 


The semicolons separating the 
entries in the first square brackets 
means that they are column vec- 
tors; this is Matlab ’ s convention 
for writing column vectors. 

Use * to indicate multiplica- 
tion, and ' for power ; if /i = 
Xix\ — 1, and fi = X 2 - cosxi, 
the first entry would be 
[xl * x2“2 -1; x2-coa(xl)] . 


The following two lines give an example of how to use this program. 

EDU>syms xl x2 

EDU>nevton( [cos(xl)-xl ; sin(x2)], [.1; 3.0], 3) 

The first lists the variables; they must be called xl,x2, . . . ,xn; n may be 
whatever you like. Do not separate by commas; if n = 3 write xl x2 x3. 

The second line contains the word newton and then various terms within 
parentheses. These are the arguments of the function Newton. The first argu- 
ment, within the first square brackets, is the list of the functions f\ up to f n 
that you are trying to set to 0. Of necessity this n is the same n as for line one. 
Each / is a function of the n variables, or some subset of the n variables. The 
second entry, in the second square brackets, is the point at which to start New- 
ton’s method; in this example, ( ‘3 ) • The third entry is the number of times 
to iterate. It is not in brackets. The three entries are separated by commas. 
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B.2 Monte Carlo Program 


Like the determinant program 
in Appendix B.3, this program re- 
quires a Pascal compiler. 


The nine lines beginning func- 
tion Rand and ending end are a 
random number generator. 


For an n x n matrix you 


The six lines beginning func- 
tion randomf unction and ending 
end define a random function that 
gives the absolute value of the 
determinant of a 2 x 2 matrix 
a 6] 
c dj - 

would enter n 2 “seeds.” You can 
name them what you like; if n = 3, 
you could call them xl, x2, . . . , x9, 
instead of o, 6, . . . , i. In that case 
you would write xl : -rand (seed) ; 
x2:»rand(seed) and so on. To 
define the random function you 
would use the formula 



ai 

61 

Cl 

det 

02 

62 

C2 


.03 

63 

c 3 . 


01(6203 — 63C2) — 02(61 C3 —63C1) 
+ 03(61 C 2 ~ 62C1). 


program montecarlo; 
const lengthofrun = 100000; 

var S , V , x , intguess , varguess , stddev , squareroot lerun , 

errf if ty , erminety , errninetyf ive : longreal ; i , seed , answer : longint ; 

function Rand(var Seed: longint): real; 

{Generate pseudo random number between 0 and 1} 
const Modulus * 65536; 

Multiplier * 25173; 

Increment = 13849; 
begin 

Seed := ((Multiplier * Seed) + Increment) mod Modulus; 

Rand :* Seed / Modulus 
end; 

function randomf unct ion : real ; 

var a,b,c,d:real; 

begin 

a:*rand(seed) ;b:*rand(seed) ;c:*rand(seed) ;d:*r and (seed) ; 
randomf unct ion :*abs(a*d-b*c) ; 
end; 

begin 

Seed := 21877; 
repeat 
S:=0;V:=0; 

for i:=l to lengthofrun do 
begin 

x : =randomfunction ; 

S:=S+x;V:=V+sqr(x); 

end; 

intguess: =S/lengthofrun; varguess: = V/lengthofrun-sqr(intguess); 

stddev:= sqrt( varguess); 

squarerootlerun: =sqrt(lengthofrun); 

errfifty:= 0.6745*stddev/squarerootlerun; 

errninety : = 1 .645*stddev /squarerootlerun ; 

errninetyfive: = 1 . 960*stddev / squarerootlerun; 
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writeln(’average for this run = \intguess); 
writeln( ’estimated standard deviation = \stddev); 
writeln(’with probability 50’,errfifty); 
writeln(’with probability 90’,erminety); 
writeln(’with probability 95’,errninetyfive); 
writeln( ’another run? 1 with new seed, 2 without’); 
readln(answer); 
if (answers 1) then 
begin 

writeln( ’enter a new seed, which should be an integer’); 
readln(seed); 

end; 

until ( answer =0); 
end. 


Another example of using the Monte Carlo program: 

In Pascal, x 2 is written sqr(x). To compute the area inside the unit square and above the parabola y — x 2 , you 

would type 

function randomf unction: real ; 

var x # y:real; 

begin 

x : -rand ( seed) ; y : -rand ( seed) ; 
if (y-sqr(x) <0 ) then randomfunction:*0 
else randomf unct ion :-l ; 
end 
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B.3 Determinant Program 

Program determinant; 

Const maxsize = 10; 

Type matrix = record 

size:integer; 

coeffs: array [1.. maxsize, 1.. maxsize] of real; 

end; 

submatrix = record 

size: integer; 

rows ,cols:arr ay [1.. maxsize] of integer; 

end; 

Var M: Matrix; 

S: submatrix; 
d: real; 

Function det(S:submatrix):real; 

Var tempdet: real; 
i,sign: integer; 

Si: submatrix; 

Procedure erase(S:submatrix; ij: integer; var Sl:submatrix); 

Var k:integer; 
begin {erase} 

Sl.size:=S.size-l; 

for k:=S.size-l downto i do Sl.cols[k]:=S.cols[k+l]; 
for k:=i-l downto 1 do S 1. cols [k] := S . cols [k] ; 
for k:=S.size-l downto j do Sl.rows[k]:=S.rows[k+l]; 
for k:=j-l downto 1 do Sl.rows[k]:=S.rows[k]; 

end; 

begin {function det} 

If S.size = 1 then det := M.coeffs[S.rows[l],S.col[l]] 
else begin 

tempdet := 0; sign := 1; 
for i := 1 to S.size do 

begin 

erase(S,i,l,Sl); 

tempdet := tempdet + sign*M.coeffs[S.rows[l],S.cols[i]]*det(Sl); 
sign := -sign; 

end; 

det := tempdet; 

end; 


end; 
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begin {function det} 

If S.size = 1 then det := M.coefFs[S.rows(lj,S.col(l]J 

else begin 

tempdet := 0; sign := 1; 
for i := 1 to S.size do 

begin 

erase(S,i,l,Sl); 

tempdet := tempdet + sign*M.coeffs[S.rows[lI,S.cols[i])*det(Sl); 
sign := -sign; 

end; 

det := tempdet; 

end; 

end; 

Procedure InitSubmatrix (Var Srsubmatrix); 

Var k: integer; 

begin 

S.size := M.size; 

for k := 1 to S.size do begin S.rows[k] := k; S.cols[k] := k end; 

end; 

Procedure InitMatrix; 

begin {define M.size and M.coeffs any way you like} end; 

Begin {main program} 

InitMatrix; 

InitSubmatrix(S); 
d := det(S); 

writeln (’determinant = ’,d); 


end. 
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complement, 5 
completing squares, 292 
algorithm for, 293 
proof, 618-619 
complex number, 14-17 
absolute value of, 16 
addition of, 14 

and fundamental theorem of algebra, 95 
length of, 16 
multiplication of, 15 
and vector spaces. 189 


composition, 50 

and chain rule, 118 
diagram for, 195 
and matrix multiplication, 57 
computer graphics, 275 
computer, 201 

concrete to abstract function, 193 
connected, 268 
conservative, 555, 569 
conservative vector field, 569, 571 
constrained critical point, 304 
constrained extrema, 304 

finding using derivatives, 304 
contented set, see pavable set, 361 
continuity, 4, 72, 85 
rules for, 86 

continuously differentiable function, 122, 124 
criterion for, 124 
continuum hypothesis, 13 
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convergent series, 11 
of vectors, 87 

convergent subsequence, 89 

existence in compact set 89 
convex domain, 571 
correlation coefficient, 369 
cosine law, 60 
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degenerate critical point, 302 
degenerate, nondegenerate, 297 
degrees of freedom, 261, 268 
del, 550 
density, 551 
density form, 519 
integrating, 521 
derivative, 100, 101, 108 

and Jacobian matrix, 108, 121 
in one dimension, 101 
in closed sets, 73 
in several variables, 105 
of composition, 1 19 
reinterpreted, 544 
rules for computing, 115 
determinant, 66, 185, 405, 406, 407, 505 
effective formula for computing, 411 
how affected by column operations, 409 
of 2 x 2 matrix, 66 
of 3 x 3 matrix, 67 
of elementary matrix, 412 
of product of matrices, 411 


of transpose, 412, 413 
of triangular matrix, 414 
in R 3 , geometric interpretation of, 71 
in R n , defined by properties, 406 
independent of basis, 414 
measures volume, 420, 426 
proof of existence of, 632-635 
and right-hand rule, 71 
Determinant (Pascal program), 672-673 
diagonal matrix, 43 
diagonal, 40 

Dieudonn6, Jean, 58, 286, 589 
diffeomorphism , 446 
differential equation, 313 
differential operators, 550 
dimension, 33, 195 
of subspace, 175 

dimension formula, 183, 184, 188 
direct basis, 512 

directional derivative, 104, 121, 554 
and Jacobian matrix, 109 
Dirichlet, 291 
discriminant, 19, 62 
div see divergence 
divergence, 550, 553, 556 

geometric interpretation of, 556 
divergence theorem, 566-567 
divide and average, 198 
domain, 47 

dominated convergence theorem, 352, 442 
proof, 643-648 
Dorier, Jean-Luc, 185 
dot product, 58, 59, 554 

geometric interpretation of, 60 
not always available, 554 
and projections, 61 
double integral, 352 
dyadic cube, definition, 356 
volume of, 356 
dyadic paving, 356, 404 
and Riemann sums, 355 
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dynamical systems, 197 
chaotic behavior of, 383 

Eberlein, 644 
echelon form, 151, 154 
eigenvalue, 313 
eigenvector, 173, 313 

finding using Lagrange multipliers, 314 
Einstein, 100 
electric flux, 469 

electromagnetic field, 34, 291, 316, 523 
electromagnetic potential, 572 
electromagnetism, 203, 499, 517, 572 
element of (€), 5 
element of angle, 548 
element of arc length, 479 
elementary form, measures signed volume, 502 
elementary matrices, 163 
invertible, 164 
multiplication by, 164 
empty set (0), 5 
epsilon-delta proofs, 77, 78 
equations versus parametrizations, 272 
error function, 447 
Euclid, 5 

Euclidean norm, 59 

Euler, Leonhard, 29, 165, 166, 185, 291 
even function, 287 
even permutations, 416 event, 363 
existence of solutions, 49, 177, 168, 183, 184 
expectation, 366 

can be misleading, 367 

exterior derivative, 499, 500, 544, 545, 553, 660 
commuting diagram illustrating, 553 
computing, 551, 546-547 
proof of rules for computing, 652-655 
taken twice is 0, 549 
exterior product, 509 
extremum, definition, 299 

Faraday, 572 
feedback, 52,1 00 
Fermat’s little theorem, 291 


Fermat, 291 

field of general relativity, 34 
fields, 34 

finite dimensional, 196 
fluid dynamics, 204 
flux, 501, 551, 556 
flux form field, 518 
integrating, 521 

force fields, 554 forms, 427, 499, 557 

form fields, 34, 500 

Fortran, 409 

Fourier, Joseph, xi 

Fourier transform, 436 

fractal, 491 

fractional dimension, 491 

Frenet formulas, 330 

Frenet frame, 329, 330 

Fubini’s theorem, 279, 387, 395, 437, 606 

and computing probabilities, example, 393 
and improper integrals, 444 
proof, 627-629 
function, 47 

fundamental theorem of algebra, 17, 95 
proof of, 96 

fundamental theorem of calculus, 499, 501, 544, 
556 

proof of, 559 

Galois, Evariste, 96 
gauge theory, 572 
Gauss, 3, 17, 96, 291, 564 
Gauss’s theorem (divergence theorem), 566 
Gaussian bell curve, 446 
Gaussian curvature, 321, 322, 324 
Gaussian elimination, 185 
Gaussian integral, 446 
Gaussian integration, 399 
Gaussian rules, 398 
general relativity, 203 
generalized Stokes’s theorem, 556-563 
geometric series, 11, 88 
of matrices, 87 
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geometry of curves 

in 5£ i * 3 parametrized by arc length, 329 
parametrized by arc length, 320 
global Lipscliitz ratio, 202 
Godel, 13 
grad, see gradient 
gradient, 500. 550, 553-555, 569 
dependent on dot product, 554 
geometric interpretation of, 550 
transpose of derivative, 554 
graph theory, 43 
gravitation, 316, 568 
gravitation field, 34, 518 
gravitational force field, 555 
greatest lower bound, 92, 93, 354 
Greek alphabet, 2 
Greeks, 96 
Green, 564 

Green’s theorem, 563 564 
gravitation, 316, 568 
gravitation field, 34, 518 
group homomorphism, 415 

Hadaniard, 286 
Hamilton’s quaternions, 15 
Hausdorff, Felix, 491, 644 
Heine-Borel theorem, 643 
Heisenberg, 36, 313 
Hermite, 120 

higher partial derivatives, 203 
Hilbert, David, 313 
holes, in domain, 568, 571 
homogeneous, 181 
homology theory, 536 
l’Hopital's rule, 275, 340 

i (standard basis vector), 33 

I-integrable see improper integrals 

identically, 121 
identity, 156 
identity matrix, 40 
image, 177, 178, 183, 184 
basis for, 179 


imaginary part, 14 
implicit function, derivative of, 228 
implicit function theorem, 217, 259, 266, 270 
proof of, 603 

improper integrals, 436-440 
and change of variables, 446 
and Fubini’s theorem, 444 
independence of path, 569 
inequalities, 203 
inf, see infimum 
infimum, 93 
infinite sets, 12 

infinite- dimensional vector spaces, 191 
infinity, 13 

countable and uncountable, 13 
initial guess, 198, 207, 592 
injective (one to one), 49, 178 
integers, 6, 49, , 178 

integrability, criteria for, 372, 373, 378-380 
of continuous function on M n with 
bounded support, 378 
of function continuous except on set 
of volume 0, 379 

of functions continuous except on set 
of measure 0, 380, 384 
integrable function, definition, 358 
integrable, locally, 439 
integrals, numerical computation of, 395 
integrand, 393, 469 
integration, 469 

in two variables, Simpson’s rule, 400, 401 
in several variables, probabilistic 
methods, 402 

in several variables, product rules, 400 
of 0-form, 536 
of density form, 521 
of flux form, 521 
of work form, 520 
over oriented domains, 512 
interpolation, 275 intersection (n), 5 
intuitionists, 92 
invariants, 317 
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inverse of a matrix, 40, 41, 161 
computing, 161 

in solving linear equations, 161 
of product, 42 
of 2 x 2 matrix, 42 
only of square matrices, 161 
inverse function, 217 

global vs. local, 219, 220 
inverse function theorem, 156, 217, 219, 220, 226 
completed proof of, 598-601 
in higher dimensions, 219 
in one dimension, 218 
statement of, 220 

invertibility of matrices, 595 (see also inverse) 
invertible matrix, 41 (see also inverse of matrix) 
inward-pointing vector, 540 

j (standard basis vector), 33 
Jacobi, 3 

Jacobian matrix, 105, 107, 121 
Jordan product, 133 

k, 33 
A>close, 9 
fc-form, 501 
fc-form field, 513 

/c-forms and (n - A;)- for ms, duality, 507 
^-parallelogram in R n , 470 
volume of, 470, 471 

Kantorovitch theorem, 201, 206-209, 211, 214, 217 
proof of, 592-596 
stronger version of, 214 
Kelvin, Lord, 564 
kernel, 177, 178, 183 
basis for, 180, 181 
Klein, Felix, 250 
Koch snowflake, 491, 492 
Kronecker, 291 

Lagrange, 291 
Lagrange multipliers, 309 
lakes of Wada, 383 
Landau, 644 
Laplace, 96, 117 


Laplacian, 295, 556 
latitude, 430 

least upper bound, 7, 92, 354 
Lebesguc, 644 

Lebesgue integration, 353 381, 441, 644 
Legendre, 291 
lemniscate, 429 
length 

of matrix, 64 
of vector, 59 
level curve, 254 
level set, 254, 255, 257 
as smooth curve, 257 
limit, 72 

of composition, 84 
of function, 81 

of mapping with values in IR m , 81 
rules for computing, 79, 82, 84 
well defined, 78 

line integrals, fundamental theorem for, 563 
linear algebra, history of, 36, 39, 53, 66, 87, 
165, 174, 185, 291 
linear combination, 166, 192 
linear differential operator, 192 
linear equations, 154 

several systems solved simultaneously, 160 
solutions to, 155, 160 
linear independence, 166, 168, 170 
alternative definition of, 170-171 
geometrical interpretation of, 170 
linear transformation, 46, 51, 53, 190 
and abstract vector spaces, 190 
linearity, 52, 53 

and lack of feedback, 52 
linearization, 100 
linearly independent set, 173 
Lipschitz condition, 201-203, 593 
Lipschitz constant, see Lipschitz ratio 
Lipschitz ratio, 202, 203 
difficulty of finding, 203 
using higher partial derivatives, 203, 206 
little o, 286, 610 
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local integrability, 439 
loci see locus 
locus, 5 
longitude, 430 

lower integral, definition, 357 

main diagonal, 40, 163 
Mandelbrot, Benoit, 491 
manifold, 266, 268; definition, 269 
known by equations, 270 
orientation of, 530 
map, 47, 51; 100; well defined, 48 
mapping see map 
Markov chain, 44 
matrices 

addition of, 35 
and graphs, 44 
importance of, 35, 43, 44, 46 
and linear transformations, 53 
multiplication of by scalars, 35 
and probabilities, 43 
matrix, 35, 313 
adjacency, 44 
diagonal, 43 
elementary, 163, 164 
invertible see matrix, invertible 
length of, 64 
norm of, 214 
permutation, 415 
size of, 35 

symmetric, 43; and quadratic form, 313 
transition, 44 
triangular, 43 
matrix, invertible, 41 

formula for inverse of 2 x 2, 42 
if determinant not zero, 411 
if row reduces to the identity, 161 
matrix inverse, 161 (see also inverse of matrix) 
matrix multiplication, 36-38, 52 
associativity of, 39, 57 
by a standard basis vector, 38 
not commutative, 40 
maximal linearly independent set, 172 


maximum, 92 

existence of, 93 
Maxwell’s equations, 203, 524 
mean absolute deviation, 367 
mean curvature, 321 
mean value theorem, 89, 94, 606 

for functions of several variables, 120-121 
measure, 372, 380 

measure 0, definition, 381; example, 381 
measure theory, 381 
minimal spanning set, 173, 175 
minimal surface, 321, 326 
minimum, 92, 93 
existence of, 93 
Minkowski norm, 64 
minors, 657 
Misner, Charles, 524 
mks units, 524 
modulus, 16 

Moebius, August, 174, 500 
Moebius strip, 500, 501 
Moliere, 47 
monomial, 86, 277 

monotone convergence theorem, 443, 644 
monotone function, definition, 217 
monotonicity, 219 

Monte Carlo methods of integration, 402, 403 
Monte Carlo program, 670 
Morse lemma, 303 
multi-exponent, 276, 277 
multilinearity, 406 
multiple integral, 353, 387 
rules for computing, 359 

N (natural numbers), 6 
nabla (V), 550 
natural domain, 74, 75 
Navier-Stokes equation, 204 
negation in mathematical statements, 4 
quantifiers in, 4-5 

negative definite quadratic form, 295 
nested partition, 405 
Newton program, 669 
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Newton « method, 58, 148, 197-201, 206-208, 
216, 217, 223, 399, 592 
chaotic behavior of, 211 
initial guess, 198, 201, 212 
superconvergence of, 212 
non-constructive, 92 
non-decreasing sequence, 1 1 
non-uniform convergence, 441 
nondegenerate critical point, 302 
nondegenerate quadratic form, 297 
rionintegrable function, example of, 374 
nonlinear equations, 100, 148, 197, 201 
nonlinear mappings, 100 
nonlinearity, 100 
nontrivial, 175 

norm of matrix, definition, 214 

difficulty of computing, 215, 216 
of multiples of the identity, 216 
normal distribution, 370 
normal number, 92 
normal (perpendicular), 63 
normalization, 398, 406 
notation, 29, 31, 33, 47, 354 
for partial derivatives, 102 
in Stokes’s theorem, 567 
of set theory, 5, 6 
nullity, 183 

o see little o 
O see big O 
odd function, 287 
odd permutations, 416 
one to one, 49, 178 
one variable calculus, 100 
one-sided inverse, 161 
onto, 49, 183 
open ball, 73 
open set, 72, 115 

importance of, 73 
notation for, 74 
orientation, 501 
compatible, 527 
importance of, 546 


of curve in R n , 528 
of fc-dimensional manifold, 530 
of open subset of R 3 , 528 
of point, 528 
of surface in R 3 , 528 

orientation- preserving parametrizations, 532 
nonlinear, 532 
of a curve, 531 
oriented boundary, 540 
of curve, 538 
of k- parallelogram, 542 
of piece- with-boundary of R 2 , 539 
of piece-with-boundary of manifold, 537 
of piece-with-boundary of surface, 539 
oriented domains, 512 
oriented parallelogram, 512 
orthogonal, 63 

polynomials, 173 
orthogonality, 63 
orthonormal basis, 174 
oscillation (osc), 354, 373 
osculating plane, 329 
Ostrogradski, Michael, 564 
outward- pointing vector, 540 

parallelepiped, 71 
parallelogram, area of, 66 
parameters, 268 
parametrization, 263, 473, 481 
by arc length, 320 
existence of, 477 

global, 263; (difficulty of finding), 263 
justifying change of, 648 
relaxed definition of, 474 
parametrizations, catalog of, 475-477 
parametrizations vs. equations, 265 
parametrized domains, 514 
partial derivative, 101, 103, 105 
notation for, 102 
for vector- valued function, 103 
and standard basis vectors, 101, 102 
partial differential equations, 203 
partial fractions, 186-189 
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partial row reduction, 152, 198, 232-233 
partition of unity, 663 
Pascal, 409 

pathological functions, 108, 123 
pavable set, 361 
paving in R n , definition, 404 
boundary of, definition, 404 
Peano, 89 
Peano curves, 196 
permutation, 415, 416 
matrix, 415 

signature of, 414, 415, 416 
piece- with- boundary, 537 
piecewise polynomial, 275 
Pincherle, Salvatore, 53 
pivot, 152 
pivotal 1, 151 
pivotal column, 154, 179 
pivotal unknown, 155 
plane curves, 250 
Poincare, Henri, 556 
Poincare conjecture, 286 
Poincare lemma, 572 
point, 29 

points vs. vectors, 29-31, 211 

polar angle, 16, 428 

polar coordinates, 428 

political polls, 403 

polynomial formula, 616 

positive definite quadratic form, 295 

potential, 570 

prime number theorem, 286 
prime, relatively, 242 
principle axis theorem, 313 
probability density, 365 
probability measure, 363 
probability theory, 43, 447 
product rule, 399 
projection, 55, 61, 71 
proofs, when to read, 3, 589 
pullback, 656 

by nonlinear maps, and compositions, 659 
purely imaginary, 14 
Pythagorean theorem, 60 


quadratic form, 290, 292, 617 
degenerate, 297 
negative definite, 295 
nondegenerate, 297 
positive definite, 295 
rank of, 297 

quadratic formula, 62, 95, 291 
quantum mechanics, 204 
quartic, 20, 24, 25 

1R see real numbers 
I, 438 

R m -valued mapping, 81 (see also 
vector-valued function) 
random function, 366 
random number generator, 402, 403 
and code, 402 

random variable see random function 
range, 47, 178 

ambiguity concerning definition, 178 
rank, 183, 297 

of matrix equals rank of transpose, 185 
of quadratic form, 297 
rational function, 86 
real numbers, 6-12; 
arithmetic of, 9 
and round-off errors, 8 
real part (Re), 14 
relatively prime, 242 
resolvent cubic, 25, 26 
Riemann hypothesis, 286 
Riemann integration, 381 
Riemannian dominated convergence 
theorem, 644 
Riesz, 644 

right-hand rule, 69, 70, 529, 539-541 
round-off errors, 8, 50, 153, 201 
row operations, 150 
row reduction, 148, 150, 151, 161 
algorithm for, 152 
by computers, 153 
cost of, 232 
partial, 198, 233 
and round-off errors, 153 
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row space, 185 
Russell, Bertrand, 13 
Russell’s paradox, 14 
saddle point, 255, 291, 301 
sample space, 363 
scalar, 32, 35 

Schrodinger wave equation, 204, 313 
Schwarz, 62 

Schwarz’s inequality, 62, 63 
second partial derivative, 204, 278, 606 
second-order effects, 100 
sequence, 10, 87 

convergent, 10, 76 
series, 10 

convergent, 11 
set theory, 5, 12 
sewing (and curvature), 322 
Sierpinski gasket, 492 
signature 

classifying extrema, 301 
of permutation, 414-416 
of quadratic form, 291, 292, 301 
signed volume, 405, 426 
Simpson’s method, 396, 397, 400, 402, 485 
singularity, 514 
skew commutativity, 511 
slope, 101 

smooth (fuzzy definition of) 251 
smooth curve 

in R 2 , 250-251, 254 
in R 3 , 262 

smooth surface, 257 (see also surface) 

soap bubbles, 250 

space average, 383 

spacetime, 316 

span, 166, 168 

row reducing to check, 167 
spectral theorem, 313 
spherical coordinates, 430 
splines, 275 

standard basis, 173, 174 
standard basis vectors, 33 

and choice of axes, 33 standard deviation (o), 367 
standard inner product, 58 


ooandard normal distribution function, 371 
statcoulomb, 524 
state of system, 382 
statistical mechanics, 373, 382 
statistics, 366 
Stirling’s formula, 371, 622 
Stokes’s theorem, 564-566 
Stokes’s theorem, generalized, 536, 556-563 
informal proof of, 561 
proof of, 661-665 
importance of, 557 
strict parametrization, 474 
structure, preservation of, 51 
subsequence, 80 

existence of convergent, in compact set, 89 
subset of (c), 5 
subspace, 33 , 167; of R n , 32 
substitution method, 427 
sum notation (£), 2 
sums of squares, 291 
sup see supremum 

superconvergence, 212, proof of, 597-598 
support (Supp), 354 
supremum, 92 
surface area, 483 

independent of parametrization, 485 
surface defined by equations, 259 
surface, 257 (see also smooth surface) 
surjective (onto) 49, 183 
Sylvester’s principle of inertia. 313 
symmetric bilinear function, 342 
symmetric matrix, 43 

and orthonormal bases of eigenvectors, 313 
and quadratic form, 313 

tangent line, to curve in R 2 , 253 
tangent plane to a smooth surface, 258 
tangent space 

to curve, 253-254, 261 
to manifold, 273 
to surface, 258 
tangent vector space, 273 
Taylor polynomial, 275, 282, 316 
painful to compute, 286 
rules for computing, 285 
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Taylor’s theorem, 201 
Taylor’s theorem with remainder 
in higher dimensions, 615 
in one dimension, 614 
theorem of the incomplete basis, 238 
theory of relativity, 499 
thermodynamics, 382, 383 
Thorne, Kip, 524 
Thurston, William, 17, 322 
topology, 89 

torsion, 329, 330; computing, 330 
total curvature, 495 
total degree, 277 
trace, 418 
transcendental, 23 
transformation, 51 
transition matrix, 44 
translation invariant, 378 
transpose, 42; of product, 43 
transposition, 415 
triangle inequality, 63 
triangular matrix, 43 
determinant of, 414 
trivial subspaces, 33 

Thiman, Harry (and political polls), 403 

uncountable infinite sets, 13 
uniform continuity, 4, 5, 86, 378, 441 
union (U), 5 

uniqueness of solutions, 49, 168, 177, 183, 184 

unit n-dimensional cube, 421 

units, 208, 524 

upper bound, 7 

upper integral, definition, 357 

vanish, 121 
variance, 367 
vector calculus, 499 
vector field, 34, 500, 551 

when gradient of function, 569, 571 
vector space, 189 
examples, 190 
vector, 29 (see also vectors) 
length of, 59 


vector-valued function, 103 
vectors 

angle between, 61, 63 
convergent sequence of, 76 
multiplication of by scalars, 32 
vs. points, 29-32 
Volterra, 556 
volume 0, 377, 379 
volume, n-dimensional, 356, 361 
volume, of dyadic cube, 356 
volume, signed, 426 

wedge, 502 

wedge product, 509, 511 
Weierstrass, 89 
well defined, 48 
Wheeler, J. Archibald, 524 
Whitney, Hassler, 73 
work, 551, 555, 569 
work form field, 517 
integrating, 520 
Zola, 44 



